Assembly language is great for writing really fast, efficient routines where hard-realtime timing constraints have to be met. Making a tachometer using a microcontroller running from a TTL oscillator is one example: if coded correctly, the microcontroller can be relied on to measure timing with great accuracy.
In order to do this, it is often necessary to write check-and-count routines in such a way as to make the loop execute in a fixed number of clock cycles, no matter what the current state of the count happens to be. Implementing carry logic takes extra processor cycles, so this must be compensated if the loop is to run isochronously.
On low-end to midrange 8-bit PIC microcontrollers such as the 16F88, the only available native data type is an 8-bit integer. These can be combined to produce 32-bit integers, if the next-higher byte is incremented each time the lower byte rolls back to 0x00. Done properly, the loop will always execute in the same amount of time, no matter how many of the registers roll over for a given count.
Here is the relevant portion of the code for a tachometer project I’m working on. It uses a 32-bit counter, allowing up to over four billion counts before resetting. This provides much greater accuracy than would be possible with a single 8-bit counter. The extra accuracy is worth the extra coding effort, in this case…
CycleCount:
goto $+1
Loop1:
goto $+1
Loop2:
goto $+1
Loop3:
btfss PORTA, 0 ;If A.0 is low, then
goto HandleEvent ;skip out
incfsz Count0, f ;Increment the low byte.
goto CycleCount ;Low byte isn't zero. Go back to beginning.
incfsz Count1, f ;Increment the 2nd byte.
goto Loop1 ;2nd byte isn't zero. Go back -- two cycles later.
incfsz Count2, f ;Increment the 3rd byte.
goto Loop2 ;3rd byte isn't zero. Go back four cycles later.
incf Count3, f ;Increment the high byte.
goto Loop3 ;Go back six cycles later.
Except for the last cycle where it jumps to the HandleEvent routine, this code will always execute in 44 clock cycles. Instructions on a PIC16 always take four clock cycles (eight, if the program counter register is modified.) If Count0 is incremented to a nonzero number, execution will then pass to the top of the routine — the CycleCount: label. If Count0 rolls over but Count1 does not, execution passes to the Loop1: label, saving the two instruction cycles that were used to increment Count1. Likewise, if Count2 is incremented, control passes to Loop2: — unless Count3 is also incremented.
Since the code is constructed this way, execution will always take 44 clock cycles per loop. If the count shows one million, exactly 44 million cycles (less two cycles) will have passed since the loop started timing. This allows the system accuracy to be readily calculated — and often makes the system clock (a 50PPM, 20MHz TTL oscillator, in this case) the limiting factor in accuracy.
At 20MHz (or 5 MIPS), this code allows timing of events to within 8.8 microseconds (give or take 50PPM). Not half bad for a chip costing $1.83 or less.
Now for the hard part: implementing 32-bit integer division with a processor that doesn’t even natively do multiplication…!