baremetalcode: using size

size_t is awesome - but I rarely see it employed correctly. You may think I'm here to lay down some dogmatic "type correct" philosophy, but rather, I'd like to instead give a pragmatic treatment for using size_t. First, though, some background.

Background

Predominantly, size_t is used to represent the size of objects. For example, malloc is prototyped as:

void * malloc( size_t );

As such, size_t needs to be scalable. On 32-bit platforms, size_t is an unsigned int. On 64-bit platforms, size_t is an unsigned 64-bit int.

The next most predominant usage of size_t is the "length" of strings and containers. For example, strlen is prototyped as:

size_t strlen( const char * );

Also, many STL containers (e.g., vector, queue, etc.) have methods for returning their "length". These "length" and "size" methods (when boiled down primitives) also return size_t.

The pragmatic view of size_t

It's pretty simple: know your application's needs.

For example, if your app has a ton of classes/structs for storing fixed length strings (I'm talking char[]) that are programmatically limited to 256 characters or less, storing their length with size_t is overkill - especially on 64-bit systems. In this case, you can get away with using an unsigned char for storing your string's lengths. (You just saved seven bytes: sizeof(size_t) - sizeof(unsigned char). Multiplied, this savings adds up quick.)

In contrast, if your application places no restriction on the length of strings stored - you'll definitely want to use size_t for storing string lengths.

If you're rolling your own container objects (e.g., vector, dynamic array, hash table), my preference is to use size_t for storing your element size and your element count. As an author of a "generic container", it is difficult to predict its usage - and size_t provides the maximum headroom. That said, if you're watching your memory usage closely - and you can reliably predict your application's usage of your container - you can choose a smaller built-in for storing your element size and/or element count. You would do this to save memory, of course - and depending on the container's usage, you could see a big savings.

My pet peeve for size_t is when I see it casted away needlessly. For example:

int length = (int)strlen( psz );

for ( int i = 0; i < length; i++ )

...

The above is just lazy coding. Why downcast the length of psz? (Probably: "to quiet the compiler 'downcast' warning".) On 64-bit systems, you're adding down-cast instructions In all cases, stack is cheap. In these cases, I always use size_t:

size_t length = strlen( psz );

for ( size_t i = 0; i < length; i++ )

...

The above is cleaner, faster, and in my opinion, smarter.

A couple final items of note:

size_t is completely portable. Use it with confidence between Windows and Unix. As mentioned, on 32-bit systems, it is an unsigned int (4 bytes); on 64-bit systems, it is an unsigned 64-bit int (8 bytes).
size_t is not a built-in type. When name mangled, it will be reduced to the built-in types mentioned. Therefore, when you dumpbin/nm your library you will see your usages of size_t replaced by unsigned int (32-bit systems) and unsigned 64-bit int (64-bit systems).

In a future post, I'd like to get into portable usage of integral built-in's like int, long, long long, 64-bit ints, and pointers. But if you have questions on size_t - let's have 'em.

baremetalcode

Tuesday, January 24, 2012

using size_t pragmatically

2 comments: