Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
When you're studying computer architecture, you're really learning how to answer one fundamental question: how do we make computers faster? But "faster" isn't as simple as it sounds. Performance metrics give you the vocabulary and mathematical tools to quantify speed, identify bottlenecks, and predict the impact of design changes. Every architectural decision, from pipeline depth to cache size to parallel processing, ultimately shows up in these numbers.
On exams, you're tested on more than definitions. You need to understand how metrics relate to each other, why some metrics can be misleading in isolation, and how to apply formulas like the CPU performance equation and Amdahl's Law to real scenarios. Don't just memorize what each metric measures. Know what architectural factors influence it and when to use one metric over another.
These metrics measure performance in terms of how long things take. Time is the most direct way to evaluate speed, and it answers the user's real question: "How fast will my program run?"
Execution time (also called CPU time) is the total time the processor spends working on a task. It's the single most reliable measure of performance from the user's perspective.
You calculate it with the CPU performance equation:
Notice that three independent factors feed into this: how many instructions the program requires, how many cycles each instruction takes on average, and how fast the clock ticks. A change to any one of these changes execution time, which is why this equation shows up constantly in exam problems. Lower execution time is always better.
Latency is the delay from when a task is initiated to when it completes or first responds. While execution time measures total duration, latency emphasizes responsiveness.
Compare: Execution Time vs. Latency: both measure time, but execution time focuses on total duration while latency emphasizes delay before response. Exam questions often ask you to optimize for one or the other, and the strategies differ significantly.
Rate metrics express performance as work completed per unit time. They're useful for comparisons but can be misleading without context about what kind of work is being measured.
Clock speed is measured in Hertz (Hz) and indicates how many cycles the CPU completes per second. Modern processors run in the GHz range (billions of cycles per second).
Here's the catch: clock speed alone doesn't determine performance. A 3 GHz processor isn't necessarily faster than a 2 GHz one if the 3 GHz chip needs more cycles per instruction. Clock speed only tells you one piece of the CPU performance equation. It interacts with CPI and instruction count, so higher frequency only helps if those other factors stay constant (or don't get worse).
MIPS quantifies instruction throughput:
Higher MIPS means more instructions executed per second, but this metric has well-known problems:
FLOPS measures how quickly a processor performs floating-point arithmetic, which is the bottleneck in scientific computing, physics simulations, machine learning, and graphics rendering.
Compare: MIPS vs. FLOPS: MIPS counts all instructions while FLOPS counts only floating-point operations. Use MIPS for general-purpose comparisons; use FLOPS when evaluating scientific or graphics workloads. Neither tells the whole story alone.
Throughput measures the total work completed per unit time, such as tasks, transactions, or jobs processed.
These metrics describe how well the processor uses its resources. They reveal architectural efficiency independent of clock speed.
IPC is the average number of instructions completed each clock cycle. Higher IPC means the processor is getting more useful work done per tick of the clock.
CPI is the average number of clock cycles needed to complete one instruction:
Lower CPI is better because it means instructions execute more efficiently. CPI increases when the processor stalls, which happens due to pipeline hazards, cache misses, and complex instructions that require multiple cycles.
When a program uses a mix of instruction types, you can calculate average CPI as a weighted sum:
where each is the cycle cost of instruction type and is how often that type appears.
Compare: IPC vs. CPI: these are mathematical inverses (). Use whichever makes your calculation cleaner. IPC emphasizes "instructions completed" while CPI emphasizes "cycles consumed."
These aren't raw measurements. They're frameworks for predicting and understanding performance improvements before you build anything.
Amdahl's Law predicts the maximum speedup you can achieve by improving only part of a system:
where is the fraction of execution time that can be improved and is the speedup applied to that fraction.
Here's how to apply it step by step:
The key insight: the unimproved portion dominates as grows large. If only 90% of your code is parallelizable (), then even with infinite processors (), maximum speedup is . That remaining 10% serial code becomes the bottleneck. This is why Amdahl's Law is so important for optimization decisions: it tells you where improvements will actually matter.
Benchmarks are standardized test suites that measure performance across representative workloads.
Compare: Amdahl's Law vs. Benchmarks: Amdahl's Law is theoretical (predicts limits), while benchmarks are empirical (measure actual performance). Use Amdahl's Law to guide design decisions; use benchmarks to validate real-world results.
| Concept | Best Examples |
|---|---|
| Time-based performance | Execution Time, Latency |
| Rate-based throughput | Clock Speed, MIPS, FLOPS, Throughput |
| Architectural efficiency | IPC, CPI |
| Theoretical analysis | Amdahl's Law |
| Empirical measurement | Benchmarks (SPEC) |
| CPU performance equation components | Instruction Count, CPI, Clock Rate |
| Parallel speedup limits | Amdahl's Law |
| Workload-specific metrics | FLOPS (scientific), Throughput (servers) |
If Processor A has a higher clock speed than Processor B but longer execution time for the same program, what metric must be worse for Processor A? (Hint: look at the CPU performance equation and think about what else could differ.)
A program spends 80% of its time in parallelizable code. Using Amdahl's Law, what is the maximum possible speedup with infinite processors? (You should get 5ร.)
Compare and contrast MIPS and FLOPS. When would each metric be most appropriate for evaluating processor performance?
Given the CPU performance equation, how would doubling the clock speed while also doubling the CPI affect execution time? (Work through the algebra.)
Why might a processor with IPC of 2.0 running at 2 GHz outperform a processor with IPC of 0.8 running at 4 GHz? Which metrics would you calculate to prove this?