size_t is awesome - but I rarely see it employed correctly. You may think I'm here to lay down some dogmatic "type correct" philosophy, but rather, I'd like to instead give a pragmatic treatment for using size_t. First, though, some background.
Background
Predominantly, size_t is used to represent the size of objects. For example, malloc is prototyped as:
void * malloc( size_t );
As such, size_t needs to be scalable. On 32-bit platforms, size_t is an unsigned int. On 64-bit platforms, size_t is an unsigned 64-bit int.
The next most predominant usage of size_t is the "length" of strings and containers. For example, strlen is prototyped as:
size_t strlen( const char * );
The pragmatic view of size_t
It's pretty simple: know your application's needs.
For example, if your app has a ton of classes/structs for storing fixed length strings (I'm talking char[]) that are programmatically limited to 256 characters or less, storing their length with size_t is overkill - especially on 64-bit systems. In this case, you can get away with using an unsigned char for storing your string's lengths. (You just saved seven bytes: sizeof(size_t) - sizeof(unsigned char). Multiplied, this savings adds up quick.)
In contrast, if your application places no restriction on the length of strings stored - you'll definitely want to use size_t for storing string lengths.
If you're rolling your own container objects (e.g., vector, dynamic array, hash table), my preference is to use size_t for storing your element size and your element count. As an author of a "generic container", it is difficult to predict its usage - and size_t provides the maximum headroom. That said, if you're watching your memory usage closely - and you can reliably predict your application's usage of your container - you can choose a smaller built-in for storing your element size and/or element count. You would do this to save memory, of course - and depending on the container's usage, you could see a big savings.
My pet peeve for size_t is when I see it casted away needlessly. For example:
int length = (int)strlen( psz );
for ( int i = 0; i < length; i++ )
...
The above is just lazy coding. Why downcast the length of psz? (Probably: "to quiet the compiler 'downcast' warning".) On 64-bit systems, you're adding down-cast instructions In all cases, stack is cheap. In these cases, I always use size_t:
size_t length = strlen( psz );
for ( size_t i = 0; i < length; i++ )
...
The above is cleaner, faster, and in my opinion, smarter.
A couple final items of note:
- size_t is completely portable. Use it with confidence between Windows and Unix. As mentioned, on 32-bit systems, it is an unsigned int (4 bytes); on 64-bit systems, it is an unsigned 64-bit int (8 bytes).
- size_t is not a built-in type. When name mangled, it will be reduced to the built-in types mentioned. Therefore, when you dumpbin/nm your library you will see your usages of size_t replaced by unsigned int (32-bit systems) and unsigned 64-bit int (64-bit systems).
In a future post, I'd like to get into portable usage of integral built-in's like int, long, long long, 64-bit ints, and pointers. But if you have questions on size_t - let's have 'em.