Why This Matters
When you're studying computer architecture, you're really learning how to answer one fundamental question: how do we make computers faster? But "faster" isn't as simple as it sounds. Performance metrics give you the vocabulary and mathematical tools to quantify speed, identify bottlenecks, and predict the impact of design changes. Every architectural decision—from pipeline depth to cache size to parallel processing—ultimately shows up in these numbers.
On exams, you're being tested on more than definitions. You need to understand how metrics relate to each other, why some metrics can be misleading in isolation, and how to apply formulas like the CPU performance equation and Amdahl's Law to real scenarios. Don't just memorize what each metric measures—know what architectural factors influence it and when to use one metric over another.
Time-Based Metrics
These metrics measure performance in terms of how long things take—the most intuitive way to evaluate speed. Time-based metrics answer the user's real question: "How fast will my program run?"
Execution Time
- Total time to complete a task—the ultimate measure of performance from the user's perspective
- Calculated using the CPU performance equation: Execution Time=Clock SpeedInstruction Count×CPI
- Lower is always better—this is the metric that actually matters for comparing real-world performance
Latency
- Delay from task initiation to completion—measures responsiveness rather than raw speed
- Critical for real-time applications like gaming, video conferencing, and interactive systems where delays are noticeable
- Affected by memory access times, network delays, and pipeline stalls—not just CPU speed
Compare: Execution Time vs. Latency—both measure time, but execution time focuses on total duration while latency emphasizes delay before response. FRQs often ask you to optimize for one or the other, and the strategies differ significantly.
Rate-Based Metrics
Rate metrics express performance as work completed per unit time. They're useful for comparisons but can be misleading without context about what kind of work is being measured.
Clock Speed (Frequency)
- Measured in Hertz (Hz)—indicates how many cycles the CPU completes per second (modern processors run in GHz)
- Not a complete performance indicator—a 3 GHz processor isn't necessarily faster than a 2 GHz one if architectures differ
- Interacts with CPI and instruction count—higher frequency only helps if other factors remain constant
MIPS (Million Instructions Per Second)
- Quantifies instruction throughput: MIPS=Execution Time×106Instruction Count
- Misleading across different architectures—a CISC processor may do more work per instruction than a RISC processor
- Ignores instruction complexity—simple NOPs count the same as complex floating-point operations
FLOPS (Floating-Point Operations Per Second)
- Measures floating-point calculation speed—essential for scientific computing, simulations, and graphics
- More meaningful than MIPS for numerical workloads—directly measures the operations that matter for these applications
- Reported in megaFLOPS, gigaFLOPS, or teraFLOPS—supercomputers are ranked by peak FLOPS
Compare: MIPS vs. FLOPS—MIPS counts all instructions while FLOPS counts only floating-point operations. Use MIPS for general-purpose comparisons; use FLOPS when evaluating scientific or graphics workloads. Neither tells the whole story alone.
Throughput
- Work completed per unit time—measured in tasks, transactions, or jobs processed
- Critical for servers and batch processing—where processing volume matters more than individual task speed
- Can improve even if individual latency stays constant—through parallelism and pipelining
Efficiency Metrics
These metrics describe how well the processor uses its resources. They reveal architectural efficiency independent of clock speed.
Instructions Per Cycle (IPC)
- Average instructions completed each clock cycle—higher IPC means better resource utilization
- Varies by workload and architecture—branch-heavy code typically has lower IPC than straight-line computation
- Modern superscalar processors target IPC > 1—executing multiple instructions simultaneously
Cycles Per Instruction (CPI)
- Average cycles needed per instruction: CPI=Instruction CountTotal Cycles
- Lower CPI is better—indicates more efficient instruction execution
- Affected by pipeline stalls, cache misses, and instruction mix—complex instructions and memory delays increase CPI
Compare: IPC vs. CPI—these are mathematical inverses (IPC=CPI1). Use whichever makes your calculation cleaner. IPC emphasizes "instructions completed" while CPI emphasizes "cycles consumed."
These aren't raw measurements—they're frameworks for predicting and understanding performance improvements.
Amdahl's Law
- Predicts maximum speedup from partial optimization: Speedup=(1−P)+SP1 where P is the parallelizable fraction and S is the speedup of that portion
- Reveals diminishing returns—if only 90% of code is parallelizable, maximum speedup is 10× no matter how many processors you add
- Essential for optimization decisions—tells you where improvements will actually matter
Benchmarks (e.g., SPEC)
- Standardized test suites that measure performance across representative workloads
- Enable fair comparisons—same tests run on different systems eliminate workload variability
- SPEC CPU benchmarks are industry standard—separate suites for integer (SPECint) and floating-point (SPECfp) performance
Compare: Amdahl's Law vs. Benchmarks—Amdahl's Law is theoretical (predicts limits), while benchmarks are empirical (measure actual performance). Use Amdahl's Law to guide design decisions; use benchmarks to validate real-world results.
Quick Reference Table
|
| Time-based performance | Execution Time, Latency |
| Rate-based throughput | Clock Speed, MIPS, FLOPS, Throughput |
| Architectural efficiency | IPC, CPI |
| Theoretical analysis | Amdahl's Law |
| Empirical measurement | Benchmarks (SPEC) |
| CPU performance equation components | Instruction Count, CPI, Clock Speed |
| Parallel speedup limits | Amdahl's Law |
| Workload-specific metrics | FLOPS (scientific), Throughput (servers) |
Self-Check Questions
-
If Processor A has a higher clock speed than Processor B but longer execution time for the same program, what metric must be worse for Processor A?
-
A program spends 80% of its time in parallelizable code. Using Amdahl's Law, what is the maximum possible speedup with infinite processors?
-
Compare and contrast MIPS and FLOPS—when would each metric be most appropriate for evaluating processor performance?
-
Given the CPU performance equation, how would doubling the clock speed while also doubling the CPI affect execution time?
-
Why might a processor with IPC of 2.0 running at 2 GHz outperform a processor with IPC of 0.8 running at 4 GHz? Which metrics would you calculate to prove this?