Consider Intel’s latest Ivy Bridge series. The i5-3570k is a quad-core clocked at 3.4 GHz and does not support hyperthreading. The i7-3770 is a quad-core clocked at 3.4 GHz and supports hyperthreading. When you bring up the Windows Task Manager with the i5-3570k, you will see four cores. When you bring up the Windows Task Manager with the i7-3770, you will see eight cores (four logical cores plus four hyperthreaded cores). As a result, you, the consumer, might be led to believe the i7 is twice as fast as the i5 – it has twice as many cores, right?! Sorry, not even close. Here’s the real story.
If you understand parallel computing, you know that a perfectly
parallelized algorithm will continue to improve its performance based the
number of available cores - its called scaling. The computation
is simple. If an perfectly parallelizable algorithm takes 840 s
to compute using a single core, it will take 420 s using two
cores (twice as fast, or 2x), and 210 s using four cores (four times as fast, or 4x). But what happens if we perform this
scalability test on an i7-3770? The
numbers (assuming the best case that hyperthreading performs at 15% the speed
of a logical core):
i7-3770 Cores employed
|
Time (s)
|
1
|
840
|
2
|
420 (2x)
|
3
|
280 (3x)
|
4
|
210 (4x)
|
5
|
202 (4.15x)
|
6
|
195 (4.3x)
|
7
|
189 (4.45x)
|
8
|
182 (4.6x)
|
Did you see what happened after four cores? The scaling tanked. Even using all of the i7-3770’s eight cores
is only 10% faster than using four cores – and that’s assuming the best case
scenario that each hyperthreaded core is 15% as fast as a logical core. In a system with eight real, logical cores,
the performance should have scaled to 105 s (eight times as fast as one core, or 8x).
That's the ideal world – but as it turns out, few real
world algorithms will ever get close to seeing the 10-15% hyperthreaded improvement
hypothesized. While purely scientific
applications will sometimes approach this ideal, the real world is riddled with
constraints: in particular,
threads/processes share memory, disk, and oftentimes data-structure/resources within the parallel application itself. Shared resources prevent ideal scaling, as
concurrently executing threads/processes are forced to wait for one another to acquire/use
these shared resources. This means there
is just a fraction of room left for hyperthreading to make its little voice
heard above the loud, real world din of shared resources.
Here’s a real world algorithm that I deal with every
day: compiling my C/C++ source
code. Visual Studio (and other third
party build systems) support multithreaded/multiprocess builds. This means that the compilation phase is done
in parallel. When compiling, each
thread/process is reading a source file from
disk and writing an object file to
disk – concurrently. What’s the
shared resource? The disk. What can happen to
disks? Fragmentation. What
exacerbates fragmentation? Lots of threads/processes writing to a disk
concurrently. Which is exactly what is
happening! As much as I love the fact
that multithreaded/multiprocess builds speed up my compilation phase, fragmentation
causes the “win” to diminish over time – and eventually, the subsequent link phase, which
is particularly disk intensive and single-threaded, becomes a disappointing
drag. What happens with hyperthreading? Well, even if hyperthreading improves the compilation performance by 10% (a best case given the disk is shared), disk fragmentation is increasing at twice its normal rate - and whatever
win was witnessed during compilation is quickly lost over time – and can sometimes lead to reverse scaling - when using more cores actually slows things down. (And don't even get me started on how Windows manages its NTFS partitions and how performance can degrade over time, despite the fact they've led consumers to believe that "defragging" is all they have to do... this cake is a lie.)
I have to give credit to the Intel marketing
department. They spun the hyperthreading
message and have made tons of money as
a result. Even human intuition is on
their side – when I see eight cores in Task Manager I’m wayyyyy
faster than four cores, right?! Here’s
what’s crazy – even though there are tons of independent benchmarks to back-up
my statements (that i7's are only fractionally faster than i5's) – people still dole out the big bucks for the i7-3770. Please stop.
Please use your common sense. Invest
your money sensibly. Buy an i5-3570k and use the savings to buy a bad-ass solid-state-disk (ssd).
I wager you’ll never notice the difference between an i5-3570k and an
i7-3770 – but I guarantee your life will be shockingly improved when you
upgrade from a mechanical disk to a killer ssd.