The I2S protocol is pretty straightforward — it uses a bit clock and a word clock to transfer raw (stereo or mono) audio data in digital form. The clock speeds determine the sample rate, and the number of bits sent per word clock determines the sample depth. In theory, it’s possible to mathematically define the waveform of the audio you want to create — sample by sample and bit by bit.
And that’s the upside and the curse. You get perfect control over the audio you produce (in digital form, anyway) — but in return, you have to feed the beast. If you’re using CD audio parameters, that’s two, 16-bit audio samples that need to be generated, 44100 times every second.
Quick, what’s 32,000 * sin(440*curSample/SAMPLES_PER_SECOND)?
Too late — you only had about 11.33 microseconds!
This sort of thing is why, when I asked GPT-4o for example I2S code to generate a 440Hz test signal, it used the sinf() function, instead of the usual sin(). I’m still not 100% sure which helpfully-included-for-me-because-Arduino library is being used here, but running benchmarks, it’s something like 6x faster, for a slight loss in accuracy. I think it’s using 6-term Taylor series expansions, if it’s similar to sinf() code I found online.
Could sine computation be made even faster, if some memory were set aside as a lookup table? I coded up a fastSine() function to look up float32 sine values from a table, based on an integer scaling of a float32 parameter. Swapping this in for sinf() and testing it by having each function do a million-sin summation, it worked — and was some 20% faster! At about 1.25 microseconds each, I can afford to crank the sample rate up to an almost reasonable value!
Making sine table...
Done.
Testing sin()...(3791.946274): 92282.847 ops/s (10836.250 ns/op)
Testing sinf()...(3791.948715): 657273.786 ops/s (1521.436 ns/op)
Testing fastSin()...(3787.528053): 807325.347 ops/s (1238.658 ns/op)
Well, it almost worked. After a while, the waveforms started to look somewhat shaky — and this got worse as time progressed. Resetting the ESP32 cleaned things back up, so something was going wrong with the software. Was all this caused by that ~1% error?!?
Thinking I had introduced an error with the fastSine() function, I recompiled with the left channel using sinf() and the right channel using fastSine(), to see when they started to differ. Weirdly, both of them acted similarly — so whatever the problem was, it wasn’t the fastSine() code.
After some diagnosis, the problem turned out to be caused by floating point dilution of precision. Floating point numbers are represented in a mantissa-and-exponent format. Oversimplifying, they’re scientific notation numbers in binary — and there is a limited amount of precision available for the mantissa. Larger numbers can be represented, but at the cost of precision. Double the size of the number, and you halve the precision. Once numbers get larger than 2^24 or so, the representation inaccuracy in float32 can be larger than 1.0. And for angles, we need to do better than this.
Capping the sample number at I2S_SAMPLE_RATE*24 seemed to be a good compromise, and the waveforms seem to have noticeably fewer glitches, now.
Moral: float32s have about 23 bits of precision. Choose your scale carefully!