Alright, so about 2 months ago, I talked about what your computer needs to make it work. Today, I’m going to talk about some of lesser-known fundamentals. If the last post was Computer Hardware 101, this is probably CH151. Still a freshman-level class, but a little more advanced. CH101 is definitely a prerequisite, so if you’re not familiar with the Desktop metaphor, you should read Computer Hardware: Basics Pt. 1 first.
Remember that a CPU was really just you mindlessly filling out paperwork (instructions) and shuffling papers around your desk (RAM). Now, imagine that instead of just you working on a task, you have a friend working on it with you. (yay friendship!) In theory, this means that you can work on that task twice as fast! In practice, however, this is a little more complicated. You see, you and your friend can’t work on the same piece of paper at the same time, so you’ll only see a time benefit if your friend has a totally separate task to complete. This is exactly what happens in a multi-core CPU.
Let’s do a practical example. Consider the sandwich.1I’m TAing in the fall, so I’m practicing that “Pretentious Professor” speak haha In the example I gave, the program runs sequentially, that is, do one instruction at a time. This is what’s known as a single-threaded program, because the program uses only one core at a time. Now, let’s say both you and your friend are tasked with making two sandwiches. You could load up two instances of MakeSandwich.exe and Boom! you’ve got two in the same time it took you to make one! This is great if you have to make 2+ sandwiches, but this is inefficient if you only need to make one. Most high-end laptop processors have 4 cores, and AMD just announced a massive 32-core processor. What are the rest of those 3-31 cores supposed to do if you only need one sandwich?
Consider again the sandwich. This time, instead of you making the whole sandwich yourself, you and your friend divide up the tasks. While you’re laying out the meat on the bread, your friend is washing the lettuce. While you’re toasting the sandwich to perfection, your friend is cleaning up. This is a multi-threaded program and allows multiple cores to do tasks in parallel, which is much more efficient. Now, this example also brings up another important point: not all tasks can be parallelized easily. I really can’t imagine more than two people making a single sandwich at the same time. For this task, you won’t notice much of a difference in speed between you and your friend (dual core) and you and a cooking staff (that 32-core monster).
Also, remember that neither you nor your friend can work until everything you need is loaded from the cabinet to the desk,2Remember that this speed is dictated by the transfer rate of your Hard Drive so having the second person does absolutely nothing for overall speed if you spend the majority of time getting the files out of the cabinet.
Measuring CPU Performance
If you’ve shopped for a computer, you’ve no doubt seen processor speed advertised. I’m here to tell you that number is absolutely useless without context. Let’s compare my notes laptop’s Intel i3 350m with my Desktop’s Intel Xeon X5650.3Using benchmark data found here: http://cpuboss.com/cpus/Intel-Xeon-X5650-vs-Intel-Core-i3-350M Both CPUs came out in the same year (2010). The 350m is a 2-core processor clocked at 2.26 GHz, while the X5650 is a 6-core processor clocked at 2.66 GHz, which based on that advertised number looks to be only about 20% faster. Let’s use what the professionals use to measure performance: benchmark numbers. A benchmark is a standardized task that you run across different hardware and measure how long it takes to complete the task. Use that time to get a score, and now you’re able to compare the actual performance of a bunch of different parts.
Looking at PassMark, a multi-threaded benchmark, the 350m scores 1,901 versus the X5650’s 7,510 score, nearly 4x better. The major difference here is clearly the additional cores. So, number of cores is part of the context.
Looking at GeekBench3, a single-threaded benchmark, the story gets a little more interesting. The 350m scores a 1381 while the X5650 scores a 2176, about 60% better. But remember, the clock speed was only 20% faster, so where did the other 40 percentage points come from? The answer is simply processor design. The X5650 crams more circuitry into it, allowing it to run more calculations in the same time.
The takeaway: CPU clock frequency is just one of many properties that determine actual CPU speed. The most accurate measure of performance is comparing benchmarks.
Graphics Processing Unit (GPU)
On the surface, this one’s pretty obvious: the graphics processing unit is some specialty hardware dedicated to doing the calculations needed to display things on your screen. Basic stuff like displaying text on a screen is super simple and barely needs any extra hardware. Stuff like decoding video files and handling different windows is a little more difficult, but your CPU can usually do that alongside its other tasks. Once you start getting into 3D rendering with shadows and reflections, you need some dedicated hardware. You see, the equations that games and things use to make a pretty picture are not particularly difficult, but they need to be solved for each pixel on your screen. I use a 1920×1080 screen, so that’s 2,073,600 pixels to calculate for, 60x every second.
Therefore, a GPU is really just a special version of a CPU that is optimized for thousands of easy mathematical operations to be run at once. And because it’s entirely separate computing hardware, GPUs also come with their own RAM to keep their calculation results within easy reach.
Looking deeper at how the GPU does all that is pretty interesting though. Let’s take a look at my Nvidia GTX 1060 GPU as an example.4Bought it before the cryptocurrency bubble. It’s actually worth more now used than I bought it for new a year ago! The spec sheet lists it as having 1280 cores running at 1.5 GHz. Compare that to my Intel Xeon X5650 CPU with 6 cores running at 2.5 GHz. Now I know I just got done talking about how CPU frequency isn’t the best measure of speed. But take my word for it; a single Intel core will beat the pants off a single Nvidia core. Like, professional marathon runner vs a 6-year-old in a foot race. But for massively parallel tasks like graphics, that’s like asking who can move a pile of small rocks fastest, 6 marathon runners or 1280 children? For a small pile and a long distance (a few difficult tasks), probably the marathoners. For a large pile over a small distance (a bunch of easy tasks), probably the children. The combination of both a CPU and a GPU allows you to do both kinds of tasks at maximum efficiency.
GPUs are good at doing massively parallel calculations beyond just graphics. Cryptocurrency like Bitcoin are based on algorithms designed for highly parallel systems. Certain types of machine learning and image processing are based heavily on matrix math, which in turn is based on huge numbers of easy equations. So if you hear the term “GPU Accelerated” in some software, it means the program is written to unleash the children on the pile of equations.
Multi-core CPUs can do more things at once. But only if those things are written properly.
CPU Frequency is a meaningless marketing figure. Bigger does not necessarily mean better.
CPU Benchmarks are a better way to measure performance, but they are far from perfect.
GPU = Graphics Processing Unit = A special kind of CPU with many, many cores.
Watching a team of 6 marathon runners and a team of 1280 children race to move a pile of small rocks would be a hilarious addition to the Olympics.