Computer memory from the ground up (part 3)

This is the third and final post in the series about computer memory. If you have not done it, I recommend you to read my two previous posts: Computer memory from the ground up (part 1) and Computer memory from the ground up (part 2).

In this post I will go through how dynamic memory is managed inside a process. This means, memory that is not known at compile time but it is requested during the runtime.

Most people have an idea of what happens when memory is requested, but there seems to be some confusion regarding how memory is returned, in the case of manual memory management, or how memory is reclaimed, in the case of automatic memory management.

This post is not to say that one way of managing memory is better than the other, this post is about explaining their differences.

To clarify the difference between the two, I will explain memory allocation using supermarket carts.

Manual memory management

The manual management model starts by going to the cart station and picking an empty cart. You shop the groceries, and on your way out you return the cart to the cart station.

Some people do not return the cart at the station, but leave it right next to their cars. Over time, the cart station will eventually run out of carts and the supermarket will either need to buy more carts (i.e. get more memory), or close (i.e. no more memory).

In memory terms, each abandoned cart is a memory leak. This means a piece of memory that was requested and allocated but not returned. The process has no way to recover that memory.

Automatic memory management

Since more and more people do not return the carts to the station but leave them right next to their parking slot, the supermarket decides to act and implements a new model.

People are allowed to leave the carts anywhere inside the supermarket, including the parking lot and the supermarket staff will eventually pick them up and return them to the cart station.

This is how automatic memory management works. The memory is requested the same way as in the manual model, however the memory is reclaimed after it is abandoned.

In memory management terms, the reclaiming of memory is called garbage collection.

Notice that we can still run out of carts. If that happens, then the supermarket staff will look around and see if they can reclaim unused carts. If enough unused carts are available to satisfy the demand, then there is no problem. If not enough unused carts are available, then the supermarket can buy more carts or close.

Another thing to notice is that, there is no guarantee of when there will be a garbage collection. There are different heuristics and strategies, but in practice there is no way to predict when a garbage collection will happen.

The nitty gritty details

This section will focus on explaining how memory management works from the inside. I will not go to extensive detail, I will focus on simple algorithms and implementations to show how things work and fit together.

A running process will have a section of memory called heap, which is used for dynamic memory allocations. There are several strategies to allocate the initial heap and they are not relevant for this post. Suffice to say, the heap and the stack are usually located opposed of each other. This means that the stack grows towards the end of the heap, and the heap grows towards the top of the stack.

There is a paradox in resource management, to manage any resource you need resources. The same applies in memory management, to manage memory you need memory. The memory is needed to keep the data structures used to keep track of the available memory and of the allocated memory.

For each piece of memory we want to manage and allocate, we need to keep some information. What we want to avoid is that we use more memory to keep track of memory, than what we are actually allocating. It is tempting to be able to allocate single bytes, however for each single byte we allocate we still need to keep the same information. For this reason, memory is allocated in chunks and not in bytes. Therefore we keep truck of chunks of memory of equal size, for example 1 KiB instead of keeping track of single bytes.

Memory allocation

In most programming languages, dynamic memory is requested by a call to new. In C dynamic memory is requested by calling one of the malloc family of functions. To avoid increasing the difficulty of this post, I will focus on malloc and skip the new operator since it involves construction of objects.

The declarations of malloc is:

#include <stdlib.h>
void *malloc(size_t size);

As explained earlier, we need to divide the heap into smaller chunks that can be allocated. This leads to the first problem to solve, the memory to be allocated needs to be allocated in chunks, not in bytes. Therefore we need to compute how many chunks we need based on what the requested size was.

There are many ways to find the real allocation size. A very simple strategy is to simply assign the minimum number of chunks that will cover what was requested. Another simple and popular strategy is to assign the closest and greater power of two to what was requested.

For example if our chunk size is 1 KiB (1024 bytes) and we are requested 32 bytes, our real allocation size using the former strategy is 1 chunk. Using the latter we also allocate 1 chunk (1 is also a power of two!).

Now if the requested size is 5100 bytes, then using the first strategy we will assign 5 chunks (5120 bytes). Using the second strategy we will assign 8 chunks (8192 bytes).

A simple way to implement malloc is as follows

void *malloc(size_t size) {   
  size_t chunks = compute_real_allocation_size(size);
  return allocate(chunks);
}

This implementation works regardless of the strategy used to compute the real size, and it says nothing about how we mark the memory as allocated.

A simple way to mark the memory as in used is by using a bitmap to mark the available and allocated chunks:

unsigned int chunks[NUMBER_OF_CHUNKS];

For simplicity sake I am using a constant for the maximum size of the array to avoid dealing with more dynamic memory. Unsurprisingly, there is a number of implementations that follow this route and have a maximum memory size that can be managed. The down side of this strategy is that it wastes memory if the heap is not at its maximum size, however since this is a bitmap the wasted memory is relatively small.

Each element in the array indicates a sequence of chunks. Initially all elements in the array are initialized to 0. If we allocate a chunk, then we just flip the corresponding bit to 1. See the following code excerpt that marks an assignment of 8 chunks at the end of an available sequence:

chunks[i] |= 0x000000FF

This highlights another issue that needs to be considered when allocating memory, how do we decide where to allocate the memory. Several strategies exist and they all have their strengths and weaknesses. What we want to avoid at all costs is high fragmentation, i.e. assigning chunks and leaving gaps between the assignments so that we cannot satisfy an allocation because there are no enough contiguous chunks, but the total of available chunks is enough to satisfy the allocation.

Another way to keep track of available chunks is by having a linked list of chunks.

struct Chunk {
  void *address;
  Chunk *next;
  Chunk *previous;
};

For each allocation we need to keep track of the allocation size and the starting address. We can use a linked list to do this.

struct Allocation {
  void *address;
  size_t length;
  Allocation *next;
  Allocation *previous;
};

Both the memory map and the linked list strategies have some problems. Bitmap implementations work very well for small allocations, i.e. allocations that are smaller or equal than a row. Whenever an allocation is larger than a row, the algorithm gets more complicated and the performance suffers. A linked list implementation has the problem that in order to keep track of regions of memory, i.e. more than one adjacent chunk, then we need another linked list that points to the starting of that region, and of course some kind of process to keep the different lists in sync. It is not unusual to have several lists, one covering the basic chunk size, then another for 2 chunks, and another for 4 and so forth. For this particular reason, the power of 2 allocator is very popular. The cost of managing memory increases, but using a linked list implementation it is very simple to handle both small and large allocations.

For the case of automatic garbage collection we will need to introduce a small modification. The algorithms to allocate memory are almost identical, however the difference will become clear once we have explained how automatic garbage collection works.

Manual memory release

Releasing memory is the opposite of allocating memory. In this section I will focus on the function free and will not talk about the delete operator, since the operator involves the call to destructors which will complicate the explanation.

The declaration of free is as follows:

#include <stdlib.h>
void free(void *ptr);

If we continue with the implementation used in the memory allocation section, then the problem is to convert the address we receive to the corresponding bits and sequences in our bitmap.

Using the Allocations linked list we can easily implement free as:

void free(void *ptr) {
  for (Allocation *a = Allocations; a; a = a->next) {
    if (a->address = ptr) {
      deallocate(a);
    }
  }
}

This is a very naive implementation and although it works, it has really bad performance and should be only considered for educational purposes. Also, we are not specifying how the deallocate function works, some of its responsibilities are:

  • Take the allocation and convert it to chunks that can be assigned (i.e. create a small linked list of chunks from the allocation)
  • Insert the linked list in the right place in the available chunks list
  • Call the function to synchronize the available chunks lists

Automatic garbage collection

At the beginning of this post I explained memory allocation and deallocation using shopping carts. It is relatively simple to find out if a shopping cart is in use or not, is it abandoned and without groceries? Then it can be taken back to the shopping cart station. The same cannot be said of a memory allocation, how do we know when a memory allocation has been abandoned and is ready to be reclaimed?

The simplest strategy to do this is called tracing and consists of finding out the allocations that are no longer reachable from the a root object. In order to perform this, it is necessary to stop the memory management system by grabbing the lock so that nothing is allocated while we are examining the allocations. Once we have examined all the allocations, we release the lock so the memory management system can continue. This strategy is simple, but it imposes a big restriction. No allocations can be performed while we are traversing the lists.

Another common strategy is to use reference counting for each allocation. An allocation that has a reference count of zero it is marked as reclaimable. Depending on the implementation, whether we will run a small garbage collection immediately to reclaim that memory, or whether we will wait until a later point in time to reclaim the memory.

During the memory allocation explanation I mentioned that I would come back to the allocation once I have explained how the automatic garbage collection works in order to introduce the changes that are required for the allocation strategy.

In order to implement tracing as the garbage collection strategy, we need to build a directed graph of objects so we know which allocations are in use and which are not. This graph starts from a root object and links to every allocation. It is a graph and not a tree because we can have links from different branches. Most implementations will check the graph for cycles and complain if you have introduced a cycle. A cycle is caused by a link to an object that is earlier in the hierarchy.

Using reference count we need to introduce the counting somewhere. This is usually done at allocation time. The language has to also take into account when a pointer is directly assigned to an object:

someObject->property = anotherObject;

There are several other strategies for automatic garbage collection, one of the most advanced is JVM’s G1 garbage collection. Explaining how that implementation works is outside of the scope of this post, but I urge you to read about it.

Some closing words

I have explicitly left out any mention of memory pages when talking about allocation and reclaim of memory. Although not strictly necessary, a performant implementation will try to fit allocations into memory pages and avoid allocating pieces of chunks across pages.

What’s next?

In my next post I will switch to a completely different topic and I will talk about data stream processing. In the meanwhile, please share this post with your friends!

Published by carlosware

Busy dad of three with a passion for fly fishing and computers.

One thought on “Computer memory from the ground up (part 3)

Leave a comment