Since microprocessors typically run at clock speeds of several MHz, one important software task is to implement delays, to give peripherals (or users) time to respond to data.
There are several methods of producing delays, and each has advantages and disadvantages. One of the simplest methods is the spinloop — so named because the processor simply “spins” through the same small segment of code many times, taking up a given amount of time before continuing on to the next task.
With each instruction taking a fixed amount of time — and a limited number of memory locations available for instructions (65,536 on a 16-bit Z80; 256 on the 8-bit DrACo/Z80) — loops become necessary to implement all but the shortest delays. A counter (typically a register) is set to a certain value at the beginning of the loop, and is then decremented once on each pass through the loop. On each pass, the value in the register is checked to see if it is zero. If so, the processor breaks out of the loop and continues.
Even using 16-bit registers, though, only relatively short delays of perhaps a few hundred thousand clock cycles are possible (since the largest value that can be stored in 16 bits is 65,536.) To get longer delays, multiple loops are nested — one inside the other. This multiplies the loop delays: if both the inner and outer loops have a count of 1,000, the processor will execute the inner 1000-count loop 1000 times, resulting in a much longer delay (the inner loop goes through one million cycles total.)
By varying the values loaded into the registers, the length of the delay can be set to any reasonable value. If using a system clock of 1MHz, a delay of one million instructions would result in a one-second delay — suitable for blinking an LED. A delay of a thousand instructions would result in a one-millisecond delay — a useful interval between sending commands or data to a LCD display.
In Z80 assembly code, a nested delay loop would look something like this :
LD BC, 1000h ;Loads BC with hex 1000
Outer:
LD DE, 1000h ;Loads DE with hex 1000
Inner:
DEC DE ;Decrements DE
LD A, D ;Copies D into A
OR E ;Bitwise OR of E with A (now, A = D | E)
JP NZ, Inner ;Jumps back to Inner: label if A is not zero
DEC BC ;Decrements BC
LD A, B ;Copies B into A
OR C ;Bitwise OR of C with A (now, A = B | C)
JP NZ, Outer ;Jumps back to Outer: label if A is not zero
RET ;Return from call to this subroutine
This will produce a delay roughly proportional to BC * DE. Since both are 16-bit registers, you could set both to 0xFFFF, executing the inner loop up to some 4.2 billion times. Using lower numbers would reduce the delay — using 0x1000 for both as in the example above would be a delay of about 16 million inner loop executions. (The exact formula depends on the number of cycles needed for each instruction — but often, that kind of accuracy isn’t needed.)
The reason for the LD and OR commands is to logically OR the two halves of each 16-bit register together. (It turns out that when you increment or decrement 16-bit registers on the Z80, this doesn’t affect the zero flag, so the JP NZ would go on the information from the last 8-bit operation, which is not what we want. By comparing the two bytes of each register manually, the zero flag is set accordingly, and the JP NZ instruction works as expected.)