Some words on malloc and memset


Some people seem to have a particular strategy when it comes to using malloc and memset. The strategy is simple:

Use malloc often, then memset everything. [don’t do this at home]

The bad thing about that strategy is that it leads to complex code and non-obvious bugs. The following mini-tutorial is based on some slides I prepared for a small tutorial for our students at RWTH Aachen University.

I’d like to state that for the sake of simplicity I did not go into const correctness for variables and functions. Adding const correctness is left to the interested reader (just as we did the const correctness on the fly during the tutorial).

Memory is not just memory:

When writing basic C code, you have to distinguish between two types of memory. I’ll not go into details here but I’d like to point out the most obvious differences. If you’d like to know more about memory in C, I am confident that your google skills will get you to the right documents.

• Stack  – the simple kind of memory.

  • Parameters:
    Void foo(int bar){…}
  • Auto variables:
    unsigned int foo
  • char bar[20];

• Heap – the “shoot yourself in the knee”-kind of memory.

  • Manually assigned
    foo = malloc(…);

Try to use stack memory whenever possible. The reason? It’s simple, its bound to the current program context and if the program leaves the context its automatically freed. That means: the memory is automatically freed when you don’t need it any more.

Well, there are some reasons to still use heap memory:

  1. If you ABSOLUTELY don‘t know the size of the memory block you need  at compile time: use malloc.
  2. You‘re in a function and you MUST  provide some data to the caller and you really don’t want to return a copy of the data.
    You can return everything that does not require a deep copy. For example, plain structs without pointers are fine as return values. It that does not work: consider using malloc.
    But first: try allocating memory at the caller
    Second: Try harder
    If it‘s absolutely impossible: use malloc
  3. Other reasons:
    10 You think it is convenient
    20 Think again
    30 You still think its convenient
    40 Goto 20

Especially the third reason is tricky. We’ve seen students who used malloc just because they thought it would be the [only/most convenient] way to have a pointer to a piece of memory. The bottom line is: try to avoid malloc and heap memory whenever possible.

Avoid magic numbers in combination with malloc

Okay, now that we know that malloc is evil, it’s time to talk about using it right. Unfortunately, malloc seems to attract, so called, magic numbers. Let me give an example:

Bad:

malloc(20);
malloc(BUFF_SIZE+12);

This code is almost guaranteed to break as soon as anyone changes anything related to the memory. Moreover, you will drive your poor fellow students and colleagues crazy because they have no clue why 20 or 12 is the right magic number. Maybe 21 or 11 would be much better? Simply do’t do it but try to use sensible names or even better: sizeof.

Better:

malloc(sizeof(struct foo));

Here you can easily figure out why malloc allocates the amount of memory it does. We’re going to put at struct foo to that piece of memory. Great.

Avoid malloc completely

In many cases, malloc can be avoided completely. This will simplify the code and will add to its maintaiability. Let me give you a code example:

EXAMPLE 1

// the return value has to be freed. Don‘t forget it!
struct foo* bar(void){
    struct foo *baz;
    baz = malloc(sizeof(struct foo));
 // do something with baz     ...

    return baz;
}
struct foo* baz;
baz = bar();

// do something fancy with baz for 80 lines ...

free(baz); // Phew, I am so proud that I remembered it!

Okay,  this code isn’t really bad by itself. It works and it will do whatever it was meant to do. However, it blocks programmer brain cycles. You as a programmer have to make sure that you remember to free baz in the end (and never remove it from there unless you remove baz). Otherwise the code will still do what it was supposed to do but as a little bonus, it will start eating up your memory if it is executed repeatedly. This can be especially tricky if you prematurely leave the function with a return statement.

It’s rather simple to avoid malloc and free here completely:

void bar(struct foo* baz){
    // do something with baz
    return;
}

...

struct foo baz;
bar(&baz);

// do something fancy with baz for 80 lines

The code still does the very same thing but there is no malloc and no free any more. We avoided the use of these two functions by creating an auto variable at the caller and by passing it to the callee. Oh, yes: Here you should definitely start thinking about const correctness in your code because the function bar will modify its parameter baz. This might not be obvious unless you a) put some comment there (yes, a bad option) or b) add const to all function parameters that are not modified by the function. However, that’s another story for another day.

EXAMPLE 2

unsigned int len;
len = string_oracle();

char* baz;
baz = malloc(len);

...

//do something with baz

foo(baz, len);

...

free(baz);

The reason why we used malloc in the above code is that we don’t know how long the length of the string baz will be. The len parameter comes from the mysterious string_oracle() function and there is no defined length since the nasty oracle may return anything. However, if the string oracle is not that mysterious but has an upper bound (e.g., LONGEST_STRING_LEN) for the length of the string, we can do the following:

unsigned int len;
len = string_oracle();

char baz[LONGEST_STRING_LEN];

...

//do something with baz

foo(baz, len);

...

The worst-case assumption that the string oracle will return a length smaller than LONGEST_STRING_LEN allows us to get rid of malloc and free. In many cases you can make such worst case assumptions.

If you read the above example carefully you will notice that the code is still a bit shaky. Of course, the contents of len must never be larger than LONGEST_STRING_LEN. Otherwise, the program [might/will] crash with a segmentation fault. Hence, you should check for the length of len and do some nice error handling.

Memset

For all I know, memset seems to be the best friend of malloc. You can be quite sure to see a memset wherever you see malloc. There is even a shorthand for the combination of memset and malloc: calloc. In some cases using memset and malloc is not a bad thing. However, in some cases it is a) unnecessary or b) may mask errors. I’ll give you two examples:

EXAMPLE 1

char* foo = "Hey, my code works!";
unsigned int len = 20; // len is the length of foo

char* bar;
bar= malloc(len); // my advisor told me I should ALWAYS use memset 0! memset(bar, 0, len); memcpy(bar, foo, len); printf("%s ... and it is totally braindead.\n", bar);

Here the memset is absolutely unnecessary because the code is first set to 0 and then completely overwritten. Of course, the code will work but it a) will do to much, and b) is too complicated. Complicated code attracts bugs , so why make it more complicated than it needs to be. just remove the memset.

Okay, now what is memset good for then? Initializing memory to 0 (zero) or any other value — memory that should be set to something for a reason. Okay, but memset can be tricky in that case, too. It might set too much memory to something – memory that is not supposed to be something. Let me give an example:

EXAMPLE 2

Compilers are pretty good in helping programmers to do the right thing. One very useful thing is that gcc can tell you when you forgot to initialize a variable. Just try:

$> gcc foobar.c -Wuninitialized –O1

on any c file foobar.c that tries to read an unitialized variable bar.foo3. The compiler will complain like this:

warning: ‘bar.foo3’ is used uninitialized in this function

Okay, now consider the following example code:

struct foo {
    unsigned int foo1;  // initialize to 1!
    unsigned int foo2;  // initialize to 2!
    unsigned int foo3;  // initialize to 3!
};

...

struct foo bar;

// my advisor told me I should ALWAYS use memset 0! foo = memset(&bar, 0, sizeof(struct foo)); // when initialized, foo1 must always be 1, foo2 always be 2 and foo3 always 3 bar.foo1 = 1;
bar.foo2 = 2;
// oops. I am in a hurry. I'll do the rest tof the initialization later.  ...
if (bar.foo1 == bar.foo3) {...}

Now consider our hasty programmer did not do the rest of the initialization later (maybe he even forgot to put that comment there). As he diligently set the memory of bar to zero, bar.foo3 is initialized and the compiler can’t tell us something is wrong. Here memset masks a bug that would have been trivial to find (even the compiler will tell you) otherwise:

$> gcc foobar.c -Wuninitialized –O1
... (NOTHING!)

For more  evil examples of memset read this article. It explains what happens when you apply memset with non zero values (it doesn’t do what people expect for multi-byte types) and for more complext types (structs, objects, double pointers, etc.).

Well, this concludes my little tutorial. Thanks for reading and happy coding.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s