How to tell what the CPUs are really doingīy using Performance Monitoring Counters (PMCs): hardware counters that can be read using Linux perf, and other tools. Processor manufacturers have tried to reduce this memory bottleneck with larger and smarter CPU caches, and faster memory busses and interconnects.
That levelled out around 2005 with 3 GHz processors, and since then processors have scaled using more cores and hyperthreads, plus multi-socket configurations, all putting more demand on the memory subsystem. For a long time processor manufacturers were scaling their clockspeed quicker than DRAM was scaling its access latency (the "CPU DRAM gap"). When you see high %CPU in top(1), you might think of the processor as being the bottleneck – the CPU package under the heat sink and fan – when it's really those banks of DRAM. Nowadays, CPUs have become much faster than main memory, and waiting on memory dominates what is still called "CPU utilization". The Apollo Lunar Module guidance computer (a pioneering time sharing system) called its idle thread the "DUMMY JOB", and engineers tracked cycles running it vs real tasks as a important computer utilization metric. This metric is as old as time sharing systems. If a non-idle thread begins running, then stops 100 milliseconds later, the kernel considers that CPU utilized that entire time. Your operating system kernel (whatever it is) usually tracks this during context switch. The metric we call CPU utilization is really "non-idle time": the time the CPU was not running the idle thread.
What does this mean for you? Understanding how much your CPUs are stalled can direct performance tuning efforts between reducing code or reducing memory I/O.Īnyone looking at CPU performance, especially on clouds that auto scale based on CPU, would benefit from knowing the stalled component of their %CPU. Chances are, you're mostly stalled, but don't know it. The ratio I drew above (between busy and stalled) is what I typically see in production. Stalled means the processor was not making forward progress with instructions, and usually happens because it is waiting on memory I/O. What you may think 90% CPU utilization means:
Yes, I'm talking about the "%CPU" metric used everywhere, by everyone. What is CPU utilization? How busy your processors are? No, that's not what it measures. The metric we all use for CPU utilization is deeply misleading, and getting worse every year. Systems Performance: Enterprise and the Cloud, 2nd Edition How To Add eBPF Observability To Your ProductīPF binaries: BTF, CO-RE, and the future of BPF perf tools USENIX LISA2021 Computing Performance: On the Horizon