Thursday, December 27, 2012

intel's hyperthreading: super smart marketing of a super weak feature

Here's my beef:  Intel's asks consumers to pay 50% more for their hyperthreaded chips, but those hyperthreaded chips will only ever, in the best case scenarios, be 10-15% faster than their non-hyperthreaded cousins.  Few people understand exactly what this means - so let's discuss.

Consider Intel’s latest Ivy Bridge series.  The i5-3570k is a quad-core clocked at 3.4 GHz and does not support hyperthreading.  The i7-3770 is a quad-core clocked at 3.4 GHz and supports hyperthreading.  When you bring up the Windows Task Manager with the i5-3570k, you will see four cores.  When you bring up the Windows Task Manager with the i7-3770, you will see eight cores (four logical cores plus four hyperthreaded cores).  As a result, you, the consumer, might be led to believe the i7 is twice as fast as the i5 – it has twice as many cores, right?!  Sorry, not even close.  Here’s the real story.

If you understand parallel computing, you know that a perfectly parallelized algorithm will continue to improve its performance based the number of available cores - its called scaling.  The computation is simple.  If an perfectly parallelizable algorithm takes 840 s to compute using a single core, it will take 420 s using two cores (twice as fast, or 2x), and 210 s using four cores (four times as fast, or 4x).  But what happens if we perform this scalability test on an i7-3770?  The numbers (assuming the best case that hyperthreading performs at 15% the speed of a logical core):

i7-3770 Cores employed
Time (s)
1
840
2
420 (2x)
3
280 (3x)
4
210 (4x)
5
202 (4.15x)
6
195 (4.3x)
7
189 (4.45x)
8
182 (4.6x)

Did you see what happened after four cores?  The scaling tanked.  Even using all of the i7-3770’s eight cores is only 10% faster than using four cores – and that’s assuming the best case scenario that each hyperthreaded core is 15% as fast as a logical core.  In a system with eight real, logical cores, the performance should have scaled to 105 s (eight times as fast as one core, or 8x).

That's the ideal world – but as it turns out, few real world algorithms will ever get close to seeing the 10-15% hyperthreaded improvement hypothesized.  While purely scientific applications will sometimes approach this ideal, the real world is riddled with constraints:  in particular, threads/processes share memory, disk, and oftentimes data-structure/resources within the parallel application itself.  Shared resources prevent ideal scaling, as concurrently executing threads/processes are forced to wait for one another to acquire/use these shared resources.  This means there is just a fraction of room left for hyperthreading to make its little voice heard above the loud, real world din of shared resources.

Here’s a real world algorithm that I deal with every day:  compiling my C/C++ source code.  Visual Studio (and other third party build systems) support multithreaded/multiprocess builds.  This means that the compilation phase is done in parallel.  When compiling, each thread/process is reading a source file from disk and writing an object file to disk – concurrently.  What’s the shared resource?  The disk.  What can happen to disks?  Fragmentation.  What exacerbates fragmentation?  Lots of threads/processes writing to a disk concurrently.  Which is exactly what is happening!  As much as I love the fact that multithreaded/multiprocess builds speed up my compilation phase, fragmentation causes the “win” to diminish over time – and eventually, the subsequent link phase, which is particularly disk intensive and single-threaded, becomes a disappointing drag.  What happens with hyperthreading?  Well, even if hyperthreading improves the compilation performance by 10% (a best case given the disk is shared), disk fragmentation is increasing at twice its normal rate - and whatever win was witnessed during compilation is quickly lost over time – and can sometimes lead to reverse scaling - when using more cores actually slows things down.  (And don't even get me started on how Windows manages its NTFS partitions and how performance can degrade over time, despite the fact they've led consumers to believe that "defragging" is all they have to do... this cake is a lie.)

I have to give credit to the Intel marketing department.  They spun the hyperthreading message and have made tons of money as a result.  Even human intuition is on their side – when I see eight cores in Task Manager I’m wayyyyy faster than four cores, right?!  Here’s what’s crazy – even though there are tons of independent benchmarks to back-up my statements (that i7's are only fractionally faster than i5's) – people still dole out the big bucks for the i7-3770.  Please stop.  Please use your common sense.  Invest your money sensibly.  Buy an i5-3570k and use the savings to buy a bad-ass solid-state-disk (ssd).  I wager you’ll never notice the difference between an i5-3570k and an i7-3770 – but I guarantee your life will be shockingly improved when you upgrade from a mechanical disk to a killer ssd.