Thursday, October 20, 2011

c++ timing strategies compared (in Windows)

My coworkers always come to me with questions about the best way to time a block of C++ code.  I get asked, “I want a high-resolution timer – shouldn’t I use a timer based on cpu clock-cycles?”  I’m also asked, “I need to time something that is very short (microseconds) – so a high-resolution timer will work, right?”

I’ve followed my intuitions for years on this controversial matter.  My observation has been that the CRT function clock() was accurate enough - and I've steered away from timing "really small things" since I've seen too much instability in the timing result.

Instead of intuition, though, I’m gonna put a microscope on the matter.

the plan

I figure three things are important to determine regarding C++ timers in Windows:
1. What timer is most accurate?
2. What timer can measure the smallest time?
3. Are timing calls expensive?

I’d like to compare these three timing methods:
1. clock()
2. QueryPerformanceCounter()
3. GetProcessTimes()


the strategy

I’ve written a for-loop that will always take two exactly cpu clock cycles per iteration.  Because I know the clock-speed of my computer, I can predict how long the loop should take.  I will use this super simple for-loop to measure accuracy, resolution, and overhead of the different timing methods.

for ( unsigned int i = 0; i < iterations; i++ )
{
asm { nop }
}

The asm block is required so the compiler doesn’t optimize the loop away.  As it turns out, the nop is zero clock-cycles (I verified this by adding multiple nop’s, which had no effect on the overall time).  The assembly instructions implementing the for-loop are dec and jne – both of which are one clock-cycle.

I know how long my for-loop will take given the follow equation:

                time = iterations * (number of clock cycles) / (machine clock speed)

I can now solve for iterations:

                iterations = (time) * (machine clock speed ) / (number of clock cycles)

For example, solving for iterations given I want to run for one second:

                iterations = (1 sec ) * (2.5 GHz) /  (2 cycles) = 1 250 000 000


accuracy

For each target-time, I run my simple for-loop for the number of iterations that would equate to the expected time.  Then I have an outer loop that runs 100-10000 times, so I can compute the mean and standard deviation – for all three timing methods.

mean times

Expected time (s)
clock()
QueryPerformanceCounter()
GetProcessTimes()
2.000000
2.001710
2.001693
2.004769
1.000000
1.049360
1.049450
1.004646
0.500000
0.521580
0.521670
0.503415
0.250000
0.265080
0.265137
0.255686
0.125000
0.129920
0.129856
0.125737
0.062500
0.064680
0.064644
0.063180
0.030000
0.032180
0.032244
0.030732
0.001000
0.001069
0.001069
0.001045
0.000100
0.000123
0.000123
0.000125
0.000010
0.000019
0.000022
0.000017


standard deviations

Expected time (s)
clock()
QueryPerformanceCounter()
GetProcessTimes()
2.000000
0.001663
0.001567
0.011982
1.000000
0.068809
0.068873
0.014298
0.500000
0.033658
0.033675
0.010314
0.250000
0.019893
0.019949
0.008796
0.125000
0.010623
0.010622
0.004844
0.062500
0.004164
0.004179
0.004053
0.030000
0.005066
0.005025
0.005153
0.001000
0.000504
0.000473
0.003962
0.000100
0.000424
0.000275
0.001407
0.000010
0.000136
0.000010
0.000562
Average std.dev.
0.0145
0.0145
0.0065

accuracy conclusion
In almost every way, clock() and QueryPerformanceCounter() perform exactly the same.  Their results, both the means and standard deviations, are nearly equal.  One might even conclude they’re based on the same underlying logic.

On the other hand, GetProcessTimes() generates results that are closer to the expected values than the other two timing methods.  Also, GetProcessTimes() standard deviations are better on average.

In all three cases, measuring sub-millisecond times became very difficult.  I had to move up to averaging 10000 iterations – and even then, do multiple runs.  While the standard deviations weren’t too bad for submillisecond times, all three timing methods missed the expected time-value by a fair amount.

cost of timing calls

This will be a simpler experiment.  I’ll perform one million of iterations for the above loop and make a call to each timing function within the loop.  Remember, this is the cost of making one million calls using the above loop – the cost of the loop construct is insignificant compared to the cost of the timing calls.

Note this data is averaged over ten runs and based on calls to GetProcessTimes() (since we just determined it’s the most accurate):

clock()
QueryPerformanceCounter()
GetProcessTimes()
4.963952 s
4.812631 s
0.363482 s

cost conclusion
Again, clock() and QueryPerformanceCounter() are so well matched I’m nearly convinced they have the same underlying implementation.  On the other hand, GetProcessTimes() beats them both by an order of magnitude, making it the most lightweight timing call of the three.

overall conclusion

·         clock() and QueryPerformanceCounter() are almost definitely based on the same underlying implementation (at least for Visual Studio 2010).
·         GetProcessTimes() is the accuracy winner.  It gets closer to the expected time, and generates more consistent (lower standard deviation) results.
·         GetProcessTimes() is also the call performance winner, being faster to call than the other two by an order of magnitude.
·         Below 1 ms, all three timers struggle to measure accurately - even with lots of averaging.  I don’t recommend trying to measure such small intervals.

While you may conclude you should use GetProcessTimes() for everything, I need to end with some important facts regarding it.

GetProcessTimes() only measures active CPU time – so if you call Sleep(), it doesn’t count the time spent sleeping.  Also, if the algorithm you’re benchmarking performs some type of I/O, GetProcessTimes() won’t measure the time spent waiting for I/O resources.

Finally, when multiple threads are working concurrently within the process, GetProcessTimes() reports the aggregate amount of time spend on all threads.  Therefore, if you’re trying to benchmark the performance benefit of your new multithreaded algorithm, GetProcessTimes() is not the right choice.

These two issues with GetProcessTimes() are so glaring, that for me, I always use clock().  It is by far the easiest to call – and simply measures “wall-clock” time – which in all cases, is exactly what I want.  (There are times for using GetProcessTimes - just keep these caveats in mind.)

Regardless of the timing method you choose for your C++ code:
  • To ensure reasonable accuracy, focus on timing things that take more than 30 ms.
  • Do many runs; take an average.
  • Do your best to “quiet” the machine that is performing the benchmark by closing all un programs and background processes.
  •  Remember that calls to your timing method aren’t free.
  • Always benchmark using fully optimized builds outside the debugger (or Ctrl-F5 within the debugger).

Wednesday, October 19, 2011

maybe I've figured out the AMD Bulldozer strategy

In my prior post, I came down pretty hard on AMD's recent release of their Bulldozer cpu.  I had concluded that AMD was poisoned from the inside - that they put all their money on an engineer's long-shot idea.  That still might be right - but after giving this more thought, there is another possibility - and I hope I'm right.

I'm a big believer in approaching problems laterally (my first post discusses a lateral approach to goal setting).  What if AMD, instead of tackling Intel head-on, is taking a lateral approach?

Let's face it, AMD and Intel are in a longstanding war.  Intel clearly has more troops (staff), better equipment (their own fabs), and have honed their talents in frontal assault (making insanely fast chips).  Being smaller and without equally good weaponry, AMD has to think laterally.

As a frontal assault, the Bulldozer wasn't a total flop.  AMD made a strong effort - not as strong as the market leader - but strong nonetheless.  Now, remember the big picture - the Bulldozer is just one volley in a longstanding war.  AMD, we have to assume, has something up it's sleeve.

The market for cpus goes beyond the desktop.  In fact, many argue the desktop is a dying breed.  Where are the fastest growing trends for cpus?  Massive servers (many cores, low power) for virtualization, cloud computing, and databases.  (Not to mention tablets and smart phones.)  In AMD's case, guess what?  They actually kick ass in the server space.  Check out the CPU Mark for multi-cpu systems - AMD has three systems in the top five.

Battling over the desktop is like battling over a huge land mass that is becoming less-and-less tactical for the overall war strategy.  My prediction is that AMD will, within the next year, deliver a brutal blow to Intel in the server cpu space.  Intel, will be caught off guard, because their troops are largely amassed for maintaining desktop supremacy.

Do you believe in the future that many pundits predict, of smart phones, tablets, and browser based PC's?   If so, then complex computing is moving more and more to back-end servers.  And if that's the case, then AMD's server strategy is a pretty darn good one.



Saturday, October 15, 2011

the "wish and hope" strategy

As a self-proclaimed AMD fan-boy, I have to add my commentary on their recent Bulldozer release.

I'm really disappointed.

It isn't just a matter of being disappointed in the chip.  The Bulldozer failure is a reflection on the engineers and management at AMD.  They must have adopted a "wish and hope" strategy - and that's unfortunate.  They hoped that setting a World Record overclock would matter - but it doesn't.  They hoped having eight cores would be a strong differentiator - but it isn't.

I imagine there was an AMD Bulldozer engineer who convinced his Product Marketing what direction to take the chip.  This engineer's approach was academic - focusing on architectural perfection - instead of what the market wanted.

All we wanted was a new cpu that would put a beat-down on the Core i7 - and AMD didn't deliver.  In my opinion, AMD should have held the Bulldozer release.  Would BMW release a new M5 that couldn't beat Mercede's latest AMG?  No way!!  I think I speak for many when I say I want to be proud of the cpu I choose for my system.  I've been taking a beating from my friends for years for continuing to support AMD.  I knew I would be able to dish it right back at them when the Bulldozer came out... but no... I'm still the dummy.

You goofed up, AMD.  You lost sight of what was important, you rushed a product to market, and you "wished and hoped" we'd be okay with it.  Left with the same decision, I would have simply moved the Phenom to 32 nm and made it an 8-core.  That'd be a true 8-core, none of this shared floating-point pipeline kind-of-but-not-quite-hyperthreading Bulldozer mess.  Instead, you're end-of-life-ing the Phenom.  What?!

Here's my message for everyone trying to bring a new product to market:  don't "wish and hope" that your potential future customers won't notice if you release something that isn't the best - because they will.  Instead, work with your customers:  learn how they'll benchmark your product, and learn what qualities will distinguish your product as a great one.  Then focus like hell on these "qualities of greatness" - and you won't have to wish or hope.

Friday, October 14, 2011

tampering voids warranty

I've been in software R&D for more than 17 years.  I find the recent  trends towards Extreme Programming, Agile, Scrum, etc., very interesting to observe.  Before I share my observations, consider my background:  in those 17 years, I've been a technical first-level manager for ten years, and a second-level director for two years.  I've seen these processes in action as a developer, manager, and director.  In fact, one of my responsibilities as director was "development process".  I've been through Agile training and I've used leading software (Rally) for planning/tracking iterations.

upper management's view

Upper management tends to see Agile and it's kin as "the next great thing".  It makes sense:  these processes promise high quality deliverables and fine grain control of release dates.  Also, because customer involvement is high, the deliverable will be precisely what the customer wants - no more, no less.  These are things the VP R&D struggles to get right every day - Agile processes are seen as a "silver bullet".

first and second level manager's view

First and second level managers get the bum end of the deal when it comes to rolling out Agile.  They have to understand the new process well enough to manage within it.  They have to deal with the full spectrum of developer reactions to this change:  some developers will fully embrace it, some will do everything they can to avoid Agile process.  They have to answer to the VP R&D - and try to explain why the "silver bullet" is taking so long to work.

developer's view

Developers all have something in common:  they are motivated to write awesome code and get recognized for doing so.  Daily scrums, day+ long planning sessions every iteration, and retrospectives - are just a bunch of meetings that get in the way of being awesome.  The majority of developers tend to resent the number of meetings, seeing them as low value.  The final issue:  Agile makes developers feel a bit claustrophobic, since every aspect of their day has to be "part of the iteration plan, in support of their team".

the challenge

If you read about Agile and it's kin processes, they all say the same thing:  "If you don't follow every aspect of the process, it won't work".  I call this a "tampering voids warranty" disclaimer - and I don't like it.  If someone approaches me with the-next-great-thing and it has a "tampering voids warranty" disclaimer, there's a 99% chance I'll reject it.  Let me explain a bit further with an analogy.

I see anything that has a "tampering voids warranty" disclaimer as an immutable puzzle piece.  You're the other puzzle piece.  Since the-next-great-thing puzzle piece is perfect and immutable - you have to change yourself, your team, your family, etc., to make the fit work.  The human nature response to this is to resist.  Not because of the old saying that "change is hard".  People resist because deep down, we all know that there is no silver bullet and the-next-great-thing isn't perfect - it goes against human intuition.  Also, the more you ask someone to change, the more benefit that person should see from having made the change.

Coming back to Agile, the people being asked to change are the developers and the first and second level managers - and no matter how much you polish it, they will see the change to Agile as a net loss for them.  However, the VP R&D didn't have to change at all - all s/he had to do was say, "we're an Agile shop now" - and s/he and the company get all the benefits.

so - what to do?

I wish I had a magical answer here - but I don't.  And you wouldn't believe me if I said I did, since I just made the case that there is no silver bullet process.

What I will say is this:  to be a great manager, director, or VP, you have to resist falling prey to the-next-great-process - especially ones that come with a "tampering voids warranty" disclaimer.  Great products come from great people - not process.  You simply can't invest heavily enough in your people - and I'll be honest, molding great developers/managers is a lot of work - but it always pays off.

As for process, I say tear off the "tampering voids warranty" tag and see what happens.  Trust your team to come up with variations on the process, and let it evolve.  Remember:  the most important aspect of the process you choose is that it brings benefit to people you're asking to change.

Wednesday, October 12, 2011

goal setting

Goal setting happens everywhere - at work and at home.  A personal goal might be, "I want to lose weight."  Your manager may have a performance goal for you, "Improve your communication skills."

These are both great goals - to achieve both would make you healthy (losing weight) and professionally stronger (better communicator).  However, a crucial aspect is missing:  how will you achieve these?!

I bet you thought the missing part was "making them measurable".  Agreed.  These goals lack measure.  But that is easily fixed.  Lose weight:  "I want to lose 25 lbs."  Communicate better:  "Demonstrate improved communication skills by presenting a well-rated presentation at this year's User's Conference."  I'm flying past "making these measurable" because, to be honest, that isn't the hard part.

The hard part is setting up a plan to achieve these goals.  That's because the foremost strategy for achieving a difficult goal is to attack it head-on.  The challenge with the head-on strategy is it will feel like you're making an incredible sacrifice.  Your motivation will waver and usually break because of two reasons:
  1. You're attempting to establish a new behavioral pattern that is less enjoyable than your old ways.
  2. Change in oneself takes time; slow progress will make you question and usually abandon the goal.
Let's take weight loss as an example.  A head-on strategy would be a typical approach:  diet, cardio, and resistance training.  Diet is a sacrifice - let's face it, fries, chips, butter, soda, fast-food, all taste good for a reason - and you've banned them.  Exercise takes time - and it hurts.  So you're eating less delicious food and you're sore all the time.  Your measure of progress is your weight scale.  You get on the scale every morning - and progress is brutally slow.  In as few as a couple weeks, you may start wondering, "All this sacrifice for 1 lb a week?!?!"

Here's my suggestion:  choose a lateral, or indirect goal.  The lateral goal you choose should have these qualities:
  • The side-effect of the lateral goal will be progress against the main goal.
  • Visible progress is easily made - this is critical!

Again, let's come back to weight loss.  Here are some examples of lateral goals for losing weight:
  • Eat "clean" for 21-days.
    • The measure here is to count days.
  • Train for a half-marathon that is nine months away.
    • The measure here will be progress with the distance/time you're running.
  • Enter a weight training regimen with the goal to bench-press, deadlift, and squat your bodyweight.
    • The measure here is the amount of weight you lift.

I admit that a head-on strategy will achieve the same result as a lateral approach - and maybe even faster.  But that is a surface-level perspective.  If you break the surface, people who take a head-on approach are less likely to continue their endeavor - because the psychological "net worth of the investment" will be viewed as a loss.  Progress came too slowly, and the sacrifice was significant.

On the other hand, a lateral approach has measures that are more quickly made.  Even if you only run another quarter mile this week, or get another rep at 135 lb - you're seeing more progress more quickly.  The psychological net worth will be viewed positively.  As a result, you'll be motivated to continue.

I wager that most goals are in desperate need of lateral thinking: 
  • "I want to save more money."
    • Lateral:  read a book a week for a year (suggest the library).  This would have multiple side-effects.  Less electricity used (TV).  Instead of going out to the movies, start a book club.  Most importantly, learn something new (improve yourself professionally, earning more $$$).
  • "I want to improve my relationship with my spouse."
    • Lateral:  ask him/her one question about their day, every day.
  • "I want to make my product the best."
    • Lateral:  institute a corporate-wide emphasis on first-class techniques for hiring and keeping the best staff.

I'm a big believer in lateral goal setting - I've had a lot of luck with it!  Good luck approaching your goals laterally!