Understanding structs and pointers
0x00 - Summary
You have have noticed something like the following image when browsing example code snippets or Github repositories:
This inevitably leads to the use of pointers and deferences later in the code. All of this can look quite intimating to newcomers. If you find yourself asking:
- What is a pointer?
- What is a struct?
- What are the names after the struct?
- Where did this come from?
Then this blog post is for you! Let's take a semi-deep dive into how all of this stuff works with example code. The ending will tie everything back together with snippets from other projects so you can see from start to finish how all of this works.
0x01 - What is a pointer?
A pointer stores the memory address of another variable. It essentially "points" to an area in memory. One benefit of pointers is that you can simply reference the memory location of your target variable as opposed to passing a complete copy of the target variable. This may seem trivial with smaller data types (i.e. why would I use a pointer for a target variable that is of "int" type), but it is an efficient solution when passing arguments to functions or when you are dealing with structs.
The size of a pointer varies depending on whether it is a 32 bit pointer or a 64 bit pointer. Why?
- 32 bit system = 2^32 addressable spaces = range of addresses can be stored in 4 bytes
- 64 bit system = 2 ^ 64 addressable spaces = range of addresses can be stored in 8 bytes.
In C/C++, you can store the memory address of a pointer by prepending the "&" operand in front of the target variable. In the following image, the variable something
will be stored somewhere in memory. In this case, we'll imagine it gets assigned the first address space. This address space contains the value of variable something
which is 0.
If we want to create a pointer to the variable something
, we can create it by doing the following
int * psomething = &something
The "*" after the variable type is how you declare pointers in c/c++. The "&" informs the compiler that you are storing the address of something
. In human readable form, you can interpret this statement as
the pointer psomething stores the address of the variable something
If you were to print the value of psomething
, you would need to do something like this:
printf("address of psomething %p\n", psomething);
The %p should be an indication of what is stored in the pointer - it's the address of the variable something
. Since we have the address of something, we can "dereference" the pointer to obtain the value stored at the memory address it is pointing to (i.e. the variable something). You can achieve this by performing
*psomething
This essentially states I want to obtain the value stored at the memory address my pointer is pointing to. In the following image, the printf statement would print the value of 0.
Coding Example
Reference the end of the blog post for the coding snippet used here.
If we step through this program, you'll notice the first two statements print the size of an integer pointer and integer. The first output shows that the size of the pointer is 8 bytes. The second statement shows that the size of an int is 4 bytes. You can swap out the variable type for the first statement (i.e. char *) and the output will be 8 bytes (you should know why if you understood the earlier paragraph ;p)
The next three statements show the addresses of our pointers. In this example, we have three pointers. The output on the right shows the addresses of these three pointers.
The next three statement shows how you obtain the address of a pointer. Since a pointer is pointing to a variable, you need some way of getting the address of the pointer variable itself. This can be achieved by creating a pointer to a pointer :P It may help to read the following statement int ** var_1_ptp
backwards: the variable is a pointer pointing to a integer pointer.
The following image shows the "dereference" operations in action. Notice how we are able to obtain the values stored at the original variables.
The following is another example of utilizing pointers to reference and update values. I won't go into the details about character arrays in C (just know the variable points to the first entry of the array). You should be able to deduce what is going on in the following image (and why it works)
Interesting Scenarios
You may notice that earlier the amount of space between the variables seem... large? There is approximately 32 bytes between variable 1, 2, and 3. Why is this the case? I'm running the program in Debug mode :P You can see the difference by setting the optimization level like so or by running the program in "Release" mode.
0x02 - What is a struct?
The following two definitions from wikipedia give the best explanation of a struct:
declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address
The C struct directly references a contiguous block of physical memory, usually delimited (sized) by word-length boundaries. It corresponds to the similarly named feature available in some assemblers for Intel processors. Being a block of contiguous memory, each field within a struct is located at a certain fixed offset from the start.
In summary, structs allow you to create an "object" and assign attributes to that object. The attributes you assign to a struct are allocated in a contiguous block of memory. Take a look at the following example where two structs are defined:
The struct "_no_idea" has a total byte size of 20. Graphically, struct "_no_idea" can be represented like so (pay attention to the byte sizes of each attribute)
We can see roughly how this calculation came to be by observing that int is 4 bytes plus 10 bytes for the character array (there are other optimizations at play/byte alignment in bringing it up to 20).
Since a struct is allocated as a contiguous block of memory, you can create a pointer that points to the first attribute of a struct and traverse the other attributes by calculating the datatype size of the attribute (i.e., in this example, to get to the second attribute we need to add 4 bytes from the initial struct pointer). The following image shows this in action:
As you can see, we can access each attribute by incrementing the struct pointer by the struct attribute data type. The important take away from this section is how you need to be careful when incrementing pointers as it'll automatically increment by the pointer's data type. (You'll see some tool snippets that cast a variable to a char* pointer -> this is why).
0x03 - What are aliases?
You may have noticed in the previous example that there were two names following the struct declaration. These are called aliases - you can think of them as another name for the struct. You can reference teh _no_idea
typedef struct by either using no_idea
as we did in the previous example. You can also declare a pointer via pno_idea
instead of writing out no_idea*
. You'll see aliases many times as you dive into the WINAPI header files.
0x04 - WINAPI Types
You will no doubt come across various winapi data types from github repositories. This section is meant to briefly highlight where they came from and how everything we have covered up to this point can be applied to build out basic tooling snippets.
The following two images shows two structures you may encounter during your offensive tooling journey:
The important take away is understanding struct aliases and where to find their original definition. The previous images are located in winnit.h which houses several constants/struct definitions that are useful for offensive development.
As you progress in your malware development journey, you'll come across different types of data: DWORD, PVOID and aliases such as PIMAGE_FILE_HEADER. For all intents and purposes, treat these as pointers. I won't get to the details of the specifics for these WINAPI data types - but this gross over simplification will do for now :)
0x05 - What is Type Casting?
In essence, type casting is when you try to convert a data type to another. This is a gross simplification. However, I wanted to highlight an interesting snippet where you can see this in action with structs.
A rather peculiar situation is captured in the following image - we can see what happens we inappropriately cast one struct to another (the bobby struct is missing the second integer attribute).
Notice that the first "waloo" pointer in the IDE is just a memory location. It's not until we cast that pointer to (no_idea) that the IDE loads the appropriate attributes of that pointer. When we inappropriately cast the actual struct to "bobby" struct, the IDE makes a best effort in mapping "bobby's" attributes to whatever is pointed by test_pointer (remember our previous spiel about structs being a contiguous block of memory - this means the IDE is assuming that the first 4 bytes is the hello attribute for bobby and the later 10 bytes are for the something attribute). We've previously demonstrated how you can access attributes of a struct given a pointer - by appropriately casting your struct types, you can reference the attributes like so:
waloo4->something;
This is an important point and you will see code snippets that do this -> if you have a pointer to some memory location and that memory location is the starting address for a particular struct, you need to appropriately cast the pointer in order to access its attributes.
0x06 - Real World Example
The following image should make sense - we are now equipped now to understand why and how the following snippet works (if not, reread some of the earlier paragraphs :D)
0x07 - Conclusion
We've briefly covered how pointers and structs work. This was followed by a real-world example of acquiring the start of the module list for the current process (what this is and why we want it will be covered in a latter post!)
stay tuned for more posts!