Pointers are a very powerful tool in C and similar programming languages. They are special variables that don’t directly contain a value; rather, they “point to” (contain the starting memory address of) the location of a value stored in memory. This “pointed-to” value can be any type — an integer, a floating-point value, a struct, or even another pointer. A pointer, in other words, doesn’t have the information you’re looking for — but it tells you where to go to get that information.
The use of pointers allows the construction of powerful data structures, including linked lists, queues and dequeues, and data trees. The basic idea behind pointers is easy enough, once you understand the concept; it’s usually the syntax that programmers find confusing.
Here is yet another attempt (pointer syntax has been confusing people for decades) to end the confusion and explain, simply and clearly, how to get C pointers to do what they do, including why the special characters (asterisk, ampersand etc) are needed. (I’ll assume you are already familiar with the basics of C programming, including declaring and assigning standard variables etc.)
First, a quick summary for those already familiar with the concept of pointers, but wanting a quick, concise explanation of C pointer syntax. Here is the simplest way of thinking of it that I have come up with:
- & means “The memory address of the variable named…”
- * means “The contents of the memory location pointed to by…”
For example, “int *x” means “The contents of the memory location pointed to by x is an integer.” Likewise, “mypointer = &y” means “set mypointer equal to the memory address of the variable named y.” Note how the above intuitive definitions for & and * can be just dropped, verbatim, into place. Remove these symbols, and remove the definitions from the explanation, and the examples work in a non-pointer context.
If you’re not already very familiar with both C programming as well as the idea of pointer variables, though, the above explanation won’t be of much help. In that case, a more complete explanation of what is going on is needed. Read on.
Let’s start with a simple example: declaring myval to be an integer equal to three:
int myval = 3;
This is straightforward enough: myval now refers to the value stored in a specific (as-yet-unnamed) memory location. The value stored here is currently equal to three (and is implemented as a signed integer value, probably of 32 bits.) When this variable was declared, the program requested the operating system to allocate space to assign a variable. We, as programmers, don’t (yet) know exactly where in memory this value is stored, however. For basic C programming, it doesn’t matter — but when working with pointers, we might need to know.
Now, suppose we want to know where in memory myval is stored. (For now, trust me that this is a useful thing to know.) We create a “pointer” variable, which doesn’t itself hold data, but which holds the number of a memory location (ostensibly containing our data or something else of interest.)
int *myval_pointer;
This line creates a new “pointer variable” called myval_pointer. (It doesn’t have to have “pointer” in the name — that’s just to help us remember what it is, for now. I could have called it mypointer, testpointer, or Fred, for all the compiler cares.) This new variable is set up to hold a memory location. The “int” part tells the compiler that when we use this pointer to look up the contents of a memory location, we intend for the raw data there (bytes) to be interpreted as a signed integer.
Right now, though, this new pointer doesn’t yet point to anywhere useful. Depending on how the compiler is implemented, it will either be equal to zero or will contain a random value. (Remember, always initialize your variables yourself!) Let’s put this new pointer variable to use, and have it point to the location in memory where myval is stored. (We don’t know where this is — but the compiler does!)
myval_pointer = &myval;
This statement sets the value of myval_pointer to the address of myval (some large number, perhaps in the billions on a system with an address space size in the multi-gigabyte range.) The = is the usual assignment operator, and the & symbol stands for “the memory address of.” So now, myval_pointer does indeed point to the address of myval. (Remember, this is because we assigned it this way — not because of how it’s named.)
Now, let’s see what this new way of accessing memory can do.
*myval_pointer = *myval_pointer + 1;
This statement increments the value in the memory location pointed to by myval_pointer by one. (The * symbol can be thought of as meaning “the contents of the memory location pointed to by”) Since this memory location is the one used by myval, what we’ve done is really just increment the value of myval directly in memory, without referring to it by name. If we were to print out the value of myval now, it would be 4. Compare the above line of code to the following:
myval_pointer = myval_pointer + 1;
You might think that this would increment the value of myval_pointer by one, making it point to a location one byte higher in memory. This actually isn’t the case, though — the compiler takes it upon itself to increment the value by four, since that’s the size of the int value that it was declared to point to. This statement doesn’t affect the value of myval in any way. What it does is to make myval_pointer point to the next memory address above where myval is located. (This can be very useful when going through an array of variables, for instance.)
Here is a quick example program showing some of the ways that pointer-variable syntax works. Try making your own modifications to see what happens. I recommend compiling it with gcc for Linux, in a regular user (I.E. non-root) account.
//Basic C pointer operation examples
//M. Eric Carr / Paleotechnologist.net
#include <stdio.h>
int main(){
//Declare a simple integer variable
int myval = 3;
//Declare a pointer-to-an-integer
int* myval_pointer;
//Assign the address of myval to myval_pointer
myval_pointer = &myval;
//Show the initial values of the variables.
printf (“myval is %d.\n”,myval);
printf (“myval_pointer is %#llX.\n\n”,myval_pointer);
//This increments the value and does not move the pointer. *myval_pointer = *myval_pointer + 1;
printf (“myval is now %d.\n”,myval);
printf (“myval_pointer is %#llX.\n\n”,myval_pointer);
//This moves the pointer up by four (32 bits; one int).
//The value in the original location does not change.
*myval_pointer++;
printf (“myval is now %d.\n”,myval);
printf (“myval_pointer is %#llX.\n\n”,myval_pointer); *myval_pointer–; //Undo this change.
//This also moves the pointer up by four (32 bits).
*(myval_pointer)++;
printf (“myval is now %d.\n”,myval);
printf (“myval_pointer is %#llX.\n\n”,myval_pointer);
*myval_pointer–; //Undo this, too.
//This increments the pointed-to value.
//(It’s unintuitive that ++ would have higher priority
// than the pointer dereferencing operator *, but
// there you have it.)
(*myval_pointer)++;
printf (“myval is now %d.\n”,myval);
printf (“myval_pointer is %#llX.\n\n”,myval_pointer);
//What happens when we increment the pointer by one?
myval_pointer = myval_pointer + 1;
printf (“myval is now %d.\n”,myval);
printf (“myval_pointer is %#llX.\n\n”,myval_pointer);
*myval_pointer–; //Undo this, too.
return(0);
}
…So what are pointers good for? What can they do? That’s actually quite an in-depth topic, but one of the most useful features of pointers is that they can be used to create “linked lists” and related data structures (trees, queues, and many more).
Unlike an array, which has to be allocated as a block of memory before it is used, elements can be efficiently added to, removed from, and moved around within a linked list. Instead of the “box of pigeonholes” metaphor of arrays, linked lists can be thought of as links in a chain. More links can be added, links can be removed from either end or anywhere in the middle, etc. With more advanced data structures, more complex structures can be created.
The way a simple linked list works is by setting up a custom data type. Whereas a simple data type would either contain a numerical value, a character, or perhaps a memory location (if it’s a pointer), this custom type would contain one or more pieces of data (the “payload,” and a pointer to the same custom data type.
This sounds unintuitive, until you realize that the addition of the pointer allows each data element to point to the next one in the chain. By maintaining a single pointer which points to the start of the list, a program can traverse the list, looking for a desired record, adding up totals, or whatever other operations are useful.
By convention, the pointer of the final element in the list is set to the value NULL, meaning that it doesn’t point to any memory location. If well written, code that examines the linked list by traversing it from start to finish is designed to check for this special NULL value, and stop processing when it reaches that point.