Design Article

Garbage collection in C-language code applications

Steve Rhoads

8/14/2009 7:14 PM EDT

The C programming language is ideally suited for embedded systems. It provides low level control, portability, as well as structured programming. However, it does not provide native garbage collection.

Many languages such as Java and C# support garbage collection so that a developer no longer has to explicitly free memory. Once a block of memory is no longer referenced by any pointers, the garbage collector will automatically free the memory, preventing any memory leaks.

The lack of a garbage collector could be considered a limitation of C. However, it is possible to implement a trivial garbage collector if we are willing to live with a few limitations:

1. Instead of calling malloc() to acquire memory, call CollectMalloc(bytes, ptrCount).

2. All structures that have pointers to other structures must have the pointers at the top of the structure. This permits the garbage collector to know which elements of a structure are pointers:

typedef struct Node {
  struct Node *next, *prev;   //must be at the top
  int value;
} Node;

3. When acquiring memory with CollectMalloc(bytes, ptrCount), the number of pointers at the top of the structure must be specified in ptrCount.

4. All fixed root pointers such as head and tail pointers must be registered by calling CollectRoot(&head).

5. The garbage collection function must be explicitly called.

With these limitations, a garbage collector can determine which data structures are no longer referenced so that they can be freed. When CollectMalloc() is called, the garbage collector places an OBJECT header above each requested block of memory:

typedef struct OBJECT {
  unsigned char magic;   //Contains 0x47
  unsigned char referenced;   //Is this object referenced?
  unsigned char ptrCount;   //Number of GC pointers in object
  unsigned char index;   //Number of followed GC pointers
  struct OBJECT *next;   //Next object in linked list of objects
  struct OBJECT *prev;   //Previous object in linked list
} OBJECT;

The requested block of memory will be placed into a doubly linked list, and the number of pointers at the top of the block of memory will be remembered.

When CollectGarbage() is called, all of the memory blocks will be marked as unreferenced. Then all of the registered fixed root pointers will be used to find the memory blocks that they reference. Each object will then recursively visit any other objects that it references. Each visited object will be marked as referenced. This is commonly known as the "mark" phase of a garbage collector.

Next, any unreferenced objects in the linked list will be freed. This is commonly known as the "sweep" phase of a garbage collector.

A short example illustrates how this could be used:

#define TREE_NODE_PTR_COUNT 2
  typedef struct TreeNode {
  struct TreeNode _gc   *left,   *right;   //pointers at top
  int value;
} TreeNode;

TreeNode *top, *node0, *node1, *node2;

int main()
{
  node0 = (TreeNode*)CollectMalloc(sizeof(TreeNode),   TREE_NODE_PTR_COUNT);
  node1 = (TreeNode*)CollectMalloc(sizeof(TreeNode),   TREE_NODE_PTR_COUNT);
  node2 = (TreeNode*)CollectMalloc(sizeof(TreeNode),   TREE_NODE_PTR_COUNT);

  CollectRoot(&top);   //define a fixed root node
  top = node0;
  node0->left = node2;

  CollectGarbage();   //frees unreferenced node1
  top = NULL;
  CollectGarbage();   //frees node0 and node2
  return 0;
}

This trivial garbage collector implementation enables the developer to focus on the algorithm and data structures instead of worrying about memory leaks. The garbage collector could be improved by adding semaphore protection to support multiple threads, and by placing an additional magic value at the end of all allocated memory blocks to better detect heap corruption.

Steve Rhoads, PE, (MSEE) works at Thomson developing video set-top-boxes. He is the author of the open source Plasma CPU project which also includes a RTOS and TCP/IP stack at www.opencores.org/?do=project&who=plasma. You may reach him at rhoadss@yahoo.com.

(Editor's Note: The source code for Heap Garbage Collection in C is now available on the Embedded.com Source Code Upload/Download Archive.





DKC

8/17/2009 4:39 AM EDT

If you want garbage collection use Java. Using C is about fine grained control and predictable performance, which in my experience doesn't have anything to do with garbage collection. Garbage collection is mostly for lazy programmers. I use C++ and manage my memory use carefully, speaking of which: garbage collection was invented in 1959 (http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29) and C was 1972 it seems odd that anyone is recycling this in 2009.

Sign in to Reply



JMWilliams

8/17/2009 2:17 PM EDT

Lack of garbage collection is not a limitation of C. From a performance perspective, the inclusion of garbage collection is a limitation of Java.

The reason for this is that garbage collection is incompatible with demand-paging or CPU cacheing. As soon as the garbage collector starts to work, cached data or programs are replaced with storage referencing memory objects to be collected. This forces the CPU to reaccess slower memory, on disk or in main memory, to continue reading and executing currently idle, nongarbage programs.

Inclusion of garbage collection in an application language is a design error, because such a language will not be able to take into account other system considerations such as performance. If garbage collection is to be included as a system function, it should be part of the kernel, not part of an application language. Or, maybe a hardware coprocessor, with a dedicated garbage cache, could be included in the system? Even then, my opinion is that it still would impact performance; it's a good idea, but I just don't see any way that garbage collection, as a language-dependent, independent idle-time process, could fail to reduce system performance.

Instead of garbage collection, proper memory allocation and deallocation by the application should be employed. This is automatically done when coding classes properly (viz., class destructors) in C++.

Sign in to Reply



z80man

8/20/2009 1:56 PM EDT

C was designed to be an efficent language and that could be used to do anything. BASIC was designed to be quick and dirty to do things that were with-in the limitations of the language and as such uses a garbage collector to make life easy for the programmer.

At age 12 I learned BASIC on a TRS-80 Model I and at the time I was very lucky to have access to any kind of computer. I wrote a lot of BASIC and learned the limitations of the hardware I was on and the limitation that BASIC imposed, which lead me to learning to program in assembly. I learned everything there was to know about the hardware by studying schematic diagrams and manually disassembling the entire ROM. This experience taught me to think like a CPU and throught the years in all the languages I have used I still think about how it translates to CPU instructions.

I love C and C++ because it translates to CPU instructions so directly and they have purposely left out things like garbage collection make the code execution unpredictable. On the other hand I have had numerous experiences with finding tricky bugs in C/C++ due to small errors on my part. Over the years I have come up with my bag of tricks (reusable code modules) to help do things that are needed so often and require a lot of debugging.

A several years ago I come upon a problem that did not seem to have a good solution in C++ but would be easily accomplished in Java due to its garbage collection abilities. The problem was storing large lists of strings that were passed around between threads with highly variable code execution paths. It become very difficuly to determine which code should be responsible for destroying the strings. I solved the problem by writing a class similar to CString but with the capability use a dedicated memory pool for allocating storage and cleaning up the strings objects using reference counting. Because it is not built into the language there was some limitations but its made memory management much easier.

Fast forward a few years and the same problem came up again but this time is was not just strings but a large set of data structures that required complex transformation to other data structures. My solution this time was to create a base class that has the memory management and reference counting built in. Then all my structures and any other objectects I choose could use this base class to give themselves the ability to manage themselves. It tooks a lot of operator overloading and complex debugging particulary when multiple inheritance comes into play, but in the end I had a solution that worked well as long as you followed a few rules.

Once again because the capability is not baked into the language there are some ways to shoot yourself in the foot, and particulary with multiple inheritance there is some extra memory used to implement the classes. What I have done is not garbage collection but simply making the memory management predefined for any object derived from my base class. If this was baked into C++, managing objects could become a whole lot easier without going to the extreme of garbage collection and the constant creating and copying objects that happens so often in .NET and Java.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form