size_t is awesome - but I rarely see it employed correctly. You may think I'm here to lay down some dogmatic "type correct" philosophy, but rather, I'd like to instead give a pragmatic treatment for using size_t. First, though, some background.
Background
Predominantly, size_t is used to represent the size of objects. For example, malloc is prototyped as:
void * malloc( size_t );
As such, size_t needs to be scalable. On 32-bit platforms, size_t is an unsigned int. On 64-bit platforms, size_t is an unsigned 64-bit int.
The next most predominant usage of size_t is the "length" of strings and containers. For example, strlen is prototyped as:
size_t strlen( const char * );
The pragmatic view of size_t
It's pretty simple: know your application's needs.
For example, if your app has a ton of classes/structs for storing fixed length strings (I'm talking char[]) that are programmatically limited to 256 characters or less, storing their length with size_t is overkill - especially on 64-bit systems. In this case, you can get away with using an unsigned char for storing your string's lengths. (You just saved seven bytes: sizeof(size_t) - sizeof(unsigned char). Multiplied, this savings adds up quick.)
In contrast, if your application places no restriction on the length of strings stored - you'll definitely want to use size_t for storing string lengths.
If you're rolling your own container objects (e.g., vector, dynamic array, hash table), my preference is to use size_t for storing your element size and your element count. As an author of a "generic container", it is difficult to predict its usage - and size_t provides the maximum headroom. That said, if you're watching your memory usage closely - and you can reliably predict your application's usage of your container - you can choose a smaller built-in for storing your element size and/or element count. You would do this to save memory, of course - and depending on the container's usage, you could see a big savings.
My pet peeve for size_t is when I see it casted away needlessly. For example:
int length = (int)strlen( psz );
for ( int i = 0; i < length; i++ )
...
The above is just lazy coding. Why downcast the length of psz? (Probably: "to quiet the compiler 'downcast' warning".) On 64-bit systems, you're adding down-cast instructions In all cases, stack is cheap. In these cases, I always use size_t:
size_t length = strlen( psz );
for ( size_t i = 0; i < length; i++ )
...
The above is cleaner, faster, and in my opinion, smarter.
A couple final items of note:
- size_t is completely portable. Use it with confidence between Windows and Unix. As mentioned, on 32-bit systems, it is an unsigned int (4 bytes); on 64-bit systems, it is an unsigned 64-bit int (8 bytes).
- size_t is not a built-in type. When name mangled, it will be reduced to the built-in types mentioned. Therefore, when you dumpbin/nm your library you will see your usages of size_t replaced by unsigned int (32-bit systems) and unsigned 64-bit int (64-bit systems).
In a future post, I'd like to get into portable usage of integral built-in's like int, long, long long, 64-bit ints, and pointers. But if you have questions on size_t - let's have 'em.
Do you have any idea why they choose to do size_t as unsigned? It is inconvenient for backwards loops
ReplyDeletestd::vector v;
// fill v ...
for( size_t ii=v.size()-1; ii>=0; --ii ) // bad idea
{
}
Is there an idiom for doing this better?
My guess is that size_t is unsigned for a couple reasons:
Delete* it measures a dimensional quantity
* importantly, it can scale to the extents of addressable space
Your example is a good one. For those wondering what is going on, when ii finally gets to 0 (zero), then decrements one final time, instead of becoming -1, it becomes 4294967295 - because size_t is unsigned. The loop fails to terminate, accessing uninitialized memory, eventually crashing.
When using primitives for iterating, this case does lead to a dilemma. However, you've kindly decided to use an stl vector in your example - which means you can use a reverse iterator:
typedef vector vint;
vint v;
// fill v ...
for ( vint::reverse_iterator vi = v.rbegin(); vi != v.rend(); vi++ )
printf( "%d\n", *vi );
This is the "smoothest" way I can think of for the backwards traversal of a vector. However, when stl isn't around and such conveniences aren't provided, I think we're stuck with inelegant solutions such as:
typedef vector vint;
vint v;
// fill v ...
size_t ii;
for ( ii = v.size() - 1; ii > 0; ii-- )
printf( "%d\n", v[ii] );
printf( "%d\n", v[ii] );
Can you or anyone else think of other (elegant) solutions?