Cómo se llama su LLAmA?

Large Language Models (LLMs) such as ChatGPT are dramatically changing how many information tasks (notably, writing code) are done. Thus far, these models have generally required extremely powerful servers, running multiple H200 tensor core processors, sometimes with hundreds of gigabytes of available memory.

Recently, though, there has been a good deal of success in paring these huge models down to a size where modern home PC workstations (or apparently, even a modern Raspberry Pi!) can run them. In just a few minutes, I was able to download “Ollama” server software for Meta’s “Llama” LLM models, select a model to try, and host it locally on my PC. With a little more setup (setting an environment variable to get the server to listen to the local network and not just 127.0.0.1), the Llama server was available across my local network — and I was able to chat with it via a Llama client app for my Android phone.

As a test, I asked the first model I tried (Llama 3.1) to write an article for this blog, describing how to set up a similar Llama server. Oddly, it insisted that it could not help with this, since Meta’s LLM models are not intended to run on home servers. (It even stuck to its guns after I pointed out that it was, in fact, running on my main home PC workstation, Scientia.)

So I loaded up Llama 3.2 (a smaller but newer model), and gave it the same request.
Here’s the response I got — pasted directly from the text file I used to save it.

Setting Up Your Own LLaMA LLM Server: A Step-by-Step Guide

The Llama Large Language Model (LLM) has revolutionized the field of natural language processing, enabling rapid advancements in areas such as conversational AI, text generation, and more. While access to pre-trained LLaMA models can be limited, setting up your own server at home can provide unparalleled flexibility and control over the model’s configuration.

Hardware Requirements

Before embarking on this project, it’s essential to understand the hardware requirements for popular LLaMA model sizes:

  • Small: 8-16 GB RAM, NVIDIA Tesla V100 or A100 GPU (optional)
  • Medium: 32-64 GB RAM, NVIDIA A100 GPU
  • Large: 128-256 GB RAM, multiple NVIDIA A100 GPUs
  • Huge: 512-1024 GB RAM, multiple high-end NVIDIA A100 GPUs

For a home setup, it’s recommended to start with the Small or Medium model sizes. This will allow you to test and refine your LLaMA server without incurring significant hardware costs.

Setup and Configuration

To set up your own LLaMA LLM server at home, follow these steps:

  1. Choose a Hardware Platform: Select a suitable platform for your server, such as:
    • Raspberry Pi (for small models)
    • NVIDIA Tesla V100 or A100 GPU-based servers (for medium to large models)
    • High-end desktops or workstations
  2. Install the Required Software:
    • Install the PyTorch library and its dependencies.
    • Configure your system’s network settings for optimal performance.
  3. Download the LLaMA Model: Use the official Hugging Face repository to download the desired LLaMA model size.
  4. Prepare the Dataset: Prepare a suitable dataset for fine-tuning the LLaMA model, such as a large text corpus or a custom dataset of your choice.
  5. Configure the Training Script: Write a training script using PyTorch and customize it to suit your needs. You can use the official Hugging Face LLaMA training script as a starting point.

Tips and Considerations

  • Power Consumption: Be mindful of power consumption when choosing hardware, especially for larger model sizes.
  • Memory and Storage: Ensure sufficient memory (RAM) and storage to accommodate the LLaMA model and your dataset.
  • Cooling Systems: If using high-end GPUs, consider investing in a reliable cooling system to prevent overheating.

Conclusion

Setting up your own LLaMA LLM server at home can be a rewarding experience for AI enthusiasts and researchers. By following these steps and considering the typical hardware requirements, you can create a custom setup that suits your needs. Remember to carefully evaluate power consumption, memory, and storage requirements to ensure optimal performance.

Despite what it claims about the availability of trained models, this was produced by an as-downloaded copy of Llama 3.2 without any additional local training or prompt engineering. Not bad, for a free, relatively easy-to-use model that I can run on existing hardware and have as a local resource. I think I’ll still keep my GPT subscription, but it’s definitely impressive to be able to host your own LLM at home. (And the responses are fast — much faster than I’ve seen even with quick cloud-hosted models like GPT-4 Turbo.)

I love living in the future!

Posted in Uncategorized | Leave a comment

Laser Spirograph

Many Generation-X geeks like me have happy memories of creating weird and wonderful geometric drawings with a Spirograph set — a toy invented in the 1960s in its modern form.

It occurred to me that the Lissajous-like patterns that the Spirograph produced would work well as laser engraver line art. The Spirograph’s gears can be easily replaced with lines of code, calculating the position the “pen” would be at for a given angle of rotation.

A series of tests of the Spirograph algorithm (with one moving “gear”.)
Circles-in-circles can produce some surprisingly straight paths!

Engraving was done (for the spirograph designs) at 10mm/s and 50% power, on an Atomstack 40W laser. The algorithm has constants for the radii of the outer (fixed) and inner (moving) circles. The point radius (corresponding to which hole is used for the pen) is a variable, but is typically held fixed for a complete cycle of 2N*pi radians. (The large hexagon in the above image is a sequence with the same inner and outer radii, and 31 different point radii from -15mm to 15mm.

Here’s the GitHub. Have fun!

Posted in BASIC, Laser Cutter, Toys | Tagged , , , | Leave a comment

The USB-C High Voltage Hydra: Don’t Get Bit!

Multi-connector USB-C charging cables, which allow you to charge multiple devices simultaneously using a single USB-A port, are certainly convenient. These cables typically feature one data-capable connector and multiple charging-only connectors, making them seemingly ideal for powering multiple devices on the go. However, as with many technological conveniences, there are hidden risks.

Recently, I purchased a 1-to-4 USB-A to USB-C charging cable. This cable includes one data cable and three charging cables. Initially, it seemed like a great way to simplify my charging setup. However, I soon discovered a significant design flaw: the voltage on the V+ line follows the voltage requested by the device on the data cable.

The monster. Whatever voltage the data connector (weirdly in orange) requests, everybody gets(!!)

USB defaults to +5VDC power, which has been the standard since USB first came out. (How much current a device should draw is another discussion.) But with the USB-C Power Delivery (PD) specification, USB-C capable devices can request higher voltages and/or current limits, which will then cause the power supply to change to that higher voltage.

For example, when I connected my Samsung A71 phone to a power bank via this cable’s data connector, it requested and received 9V on the power rail (as measured by a handy passthrough USB analyzer.) This 9V was then applied across the V+ lines on the other three cables, potentially subjecting any devices connected to the charging-only connectors to a voltage they were not designed to handle. (Needless to say, I suspect this cable is not standards-compliant. Hey, it was cheap.)

USB-C Power Delivery (PD) is a specification that allows devices to negotiate the power they receive. This negotiation can result in voltages of 5V, 9V, 15V, or even 20V, depending on the device’s requirements and the charger’s capabilities. While PD provides a great way to supply more power when needed, it also introduces risks when multiple devices are connected without proper isolation.

Not all USB-C devices are tolerant of higher voltages. Many devices that do not support PD negotiation are only rated for 5V. If these devices are exposed to higher voltages, they can be damaged or even rendered inoperable. This makes it crucial to ensure that devices on the charging-only connectors are either capable of handling the higher voltage or are protected from it.

If you’re charging four of the same type of thing (like those USB-rechargeable AA batteries), such cables will probably generally work (as long as the individual devices don’t draw too much power without negotiating for it.) But if you just plug in devices to charge them (as many purchasers of these cables will no doubt do), take care that the data-cable device doesn’t see a native USB-C connection, request 20V to fast-charge itself, and fry the rest of the devices connected to the cable.

After all, power is proportional to V2, so a device drawing 1x power at 5V will draw 16x the power at 20V, assuming it is essentially a resistive load. This will usually end with some part of the device releasing its magic smoke due to trying to dissipate 16x as much power as it should.

So, while there are good uses for such cables — use them carefully!

Coauthored with GPT-4o.
(They wrote the article from a description I provided; I edited for style.)

Posted in Design, Digital Citizenship, Electronics, Power | Tagged , , , , | Leave a comment

Artisinal Digital Audio

The I2S protocol is pretty straightforward — it uses a bit clock and a word clock to transfer raw (stereo or mono) audio data in digital form. The clock speeds determine the sample rate, and the number of bits sent per word clock determines the sample depth. In theory, it’s possible to mathematically define the waveform of the audio you want to create — sample by sample and bit by bit.

And that’s the upside and the curse. You get perfect control over the audio you produce (in digital form, anyway) — but in return, you have to feed the beast. If you’re using CD audio parameters, that’s two, 16-bit audio samples that need to be generated, 44100 times every second.

Quick, what’s 32,000 * sin(440*curSample/SAMPLES_PER_SECOND)?
Too late — you only had about 11.33 microseconds!

This sort of thing is why, when I asked GPT-4o for example I2S code to generate a 440Hz test signal, it used the sinf() function, instead of the usual sin(). I’m still not 100% sure which helpfully-included-for-me-because-Arduino library is being used here, but running benchmarks, it’s something like 6x faster, for a slight loss in accuracy. I think it’s using 6-term Taylor series expansions, if it’s similar to sinf() code I found online.

Could sine computation be made even faster, if some memory were set aside as a lookup table? I coded up a fastSine() function to look up float32 sine values from a table, based on an integer scaling of a float32 parameter. Swapping this in for sinf() and testing it by having each function do a million-sin summation, it worked — and was some 20% faster! At about 1.25 microseconds each, I can afford to crank the sample rate up to an almost reasonable value!

Making sine table...
Done.
Testing sin()...(3791.946274): 92282.847 ops/s (10836.250 ns/op)
Testing sinf()...(3791.948715): 657273.786 ops/s (1521.436 ns/op)
Testing fastSin()...(3787.528053): 807325.347 ops/s (1238.658 ns/op)

Well, it almost worked. After a while, the waveforms started to look somewhat shaky — and this got worse as time progressed. Resetting the ESP32 cleaned things back up, so something was going wrong with the software. Was all this caused by that ~1% error?!?

Sines with float32 precision error

Thinking I had introduced an error with the fastSine() function, I recompiled with the left channel using sinf() and the right channel using fastSine(), to see when they started to differ. Weirdly, both of them acted similarly — so whatever the problem was, it wasn’t the fastSine() code.

After some diagnosis, the problem turned out to be caused by floating point dilution of precision. Floating point numbers are represented in a mantissa-and-exponent format. Oversimplifying, they’re scientific notation numbers in binary — and there is a limited amount of precision available for the mantissa. Larger numbers can be represented, but at the cost of precision. Double the size of the number, and you halve the precision. Once numbers get larger than 2^24 or so, the representation inaccuracy in float32 can be larger than 1.0. And for angles, we need to do better than this.

Capping the sample number at I2S_SAMPLE_RATE*24 seemed to be a good compromise, and the waveforms seem to have noticeably fewer glitches, now.

Moral: float32s have about 23 bits of precision. Choose your scale carefully!

Posted in Arduino, C, Digital, I2S, Math, Troubleshooting | Tagged , , , , , , | Leave a comment