CPU cores are all made the same, right? Hyper-Threading is just a fancy way of saying “Push the turbo button harder!”
Actually, Wikipedia informs me that I’m wrong and Hyper-Threading is a fancy (and trademarked) way of saying “You can do more than one thing on a core at the same time because computers are a pack of lies.”
The General Idea
Many people seem to operate under the assumption that any one core is as good as any other core and since two cores are better than one, why doesn’t my CPU with 4 physical cores run twice as fast when I turn on Hyper-Threading?
Hyper-Threading works by taking advantage of the idea that computers are usually off doing other things – waiting for storage, waiting for the network, waiting for RAM, waiting for you to click “Buy it now” on 10 pounds of socks. So while the CPU is waiting, it goes off and does something else, like sending your credit card numbers to hackers.
Hyper-Threading exists because computers spend most of their time doing nothing, so they might as well try to be productive.
What’s That Mean For Us?
For most workloads, Hyper-Threading is great. You’re usually waiting on storage, so you might as well go ahead and send those credit card numbers off elsewhere. For CPU intensive workloads, you have to use your brain a little bit and say “Wait a minute, if I can scale linearly to the number of physical cores, what happens when I’m pretending I have more cores than I really have?”
Since this is computers and not the global financial sector, circa 2007, you hit a performance cliff. When CPU is your bottleneck, faking it won’t make anything faster.
How can I say all of this so jovially? Because I broke my computer, that’s why.
Oh Crap, He Wrote Code!
That’s right, I wrote code. I wrote a program that I call The HyperThreader. It’s dumb as a brick – it counts from 1 to 10E8 and then computes the square root of that number. This is a CPU intensive workload, no disks were harmed. The program then does the same thing but across 6 workers (the number of cores I have) and then again across 12 workers (the number of logical cores I have).
You can see the raw results over on github.
Here’s what happens:
1 thread – average time of 1151.857ms
6 threads – average time of 1194.262ms
So far so good. Execution time isn’t really changing, each task is off wandering around on its own processor core. We can account for the 40ms difference between these two because I was playing Paula Abdul’s greatest hits in the background.
12 threads – average time of 1831.81ms
Since this isn’t twice as slow, I’m going to assume that I’m not using all of my CPUs on each task (something could probably be more efficient), but this leads me to my conclusion…
IT’S ALL FILTHY DIRTY LIES!
This where people usually get tripped up. Execution gets around 53% slower when I start pretending I have resources available. Windows, and the .NET Framework, do their best to pretend that I have resources available. But, the fact is, that I don’t. I only have 6 cores, so the computer has to spend time switching between them. If resources were still available, the average execution time would be closer to what we saw with only 1 core.
If you’re wondering why your SQL Server In-Memory OLTP demo doesn’t scale beyond the number of physical cores, now you know – because you can’t imagine performance out of nothing. That’s like saying “This 4 cylinder car can haul a family of 4, so to take the extended family out and about, I need a V12” and then rushing out to by a supercar with only 2 seats.
Hat tip to Josh Bush and Dave Liebers for eyeballing the code to make sure it did what it claimed.