๐Ÿ’พIntro to Computer Architecture

Computer Performance Metrics

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

When you're studying computer architecture, you're really learning how to answer one fundamental question: how do we make computers faster? But "faster" isn't as simple as it sounds. Performance metrics give you the vocabulary and mathematical tools to quantify speed, identify bottlenecks, and predict the impact of design changes. Every architectural decision, from pipeline depth to cache size to parallel processing, ultimately shows up in these numbers.

On exams, you're tested on more than definitions. You need to understand how metrics relate to each other, why some metrics can be misleading in isolation, and how to apply formulas like the CPU performance equation and Amdahl's Law to real scenarios. Don't just memorize what each metric measures. Know what architectural factors influence it and when to use one metric over another.


Time-Based Metrics

These metrics measure performance in terms of how long things take. Time is the most direct way to evaluate speed, and it answers the user's real question: "How fast will my program run?"

Execution Time

Execution time (also called CPU time) is the total time the processor spends working on a task. It's the single most reliable measure of performance from the user's perspective.

You calculate it with the CPU performance equation:

Executionย Time=Instructionย Countร—CPIClockย Rate\text{Execution Time} = \frac{\text{Instruction Count} \times \text{CPI}}{\text{Clock Rate}}

Notice that three independent factors feed into this: how many instructions the program requires, how many cycles each instruction takes on average, and how fast the clock ticks. A change to any one of these changes execution time, which is why this equation shows up constantly in exam problems. Lower execution time is always better.

Latency

Latency is the delay from when a task is initiated to when it completes or first responds. While execution time measures total duration, latency emphasizes responsiveness.

  • Critical for real-time applications like gaming, video conferencing, and interactive systems where even small delays are noticeable
  • Affected by memory access times, network delays, and pipeline stalls, not just raw CPU speed
  • A system can have high throughput but poor latency (or vice versa), so you need to know which one the application cares about

Compare: Execution Time vs. Latency: both measure time, but execution time focuses on total duration while latency emphasizes delay before response. Exam questions often ask you to optimize for one or the other, and the strategies differ significantly.


Rate-Based Metrics

Rate metrics express performance as work completed per unit time. They're useful for comparisons but can be misleading without context about what kind of work is being measured.

Clock Speed (Frequency)

Clock speed is measured in Hertz (Hz) and indicates how many cycles the CPU completes per second. Modern processors run in the GHz range (billions of cycles per second).

Here's the catch: clock speed alone doesn't determine performance. A 3 GHz processor isn't necessarily faster than a 2 GHz one if the 3 GHz chip needs more cycles per instruction. Clock speed only tells you one piece of the CPU performance equation. It interacts with CPI and instruction count, so higher frequency only helps if those other factors stay constant (or don't get worse).

MIPS (Million Instructions Per Second)

MIPS quantifies instruction throughput:

MIPS=Instructionย CountExecutionย Timeร—106\text{MIPS} = \frac{\text{Instruction Count}}{\text{Execution Time} \times 10^6}

Higher MIPS means more instructions executed per second, but this metric has well-known problems:

  • Misleading across architectures: A CISC processor may accomplish more work per instruction than a RISC processor, so fewer MIPS could still mean faster program completion.
  • Ignores instruction complexity: A simple no-op (NOP) counts the same as a complex floating-point multiply.
  • Can vary across programs: The same processor will report different MIPS values on different workloads, making it unreliable as a single comparison number.

FLOPS (Floating-Point Operations Per Second)

FLOPS measures how quickly a processor performs floating-point arithmetic, which is the bottleneck in scientific computing, physics simulations, machine learning, and graphics rendering.

  • More meaningful than MIPS for numerical workloads because it directly measures the operations those applications depend on
  • Reported at various scales: megaFLOPS (10610^6), gigaFLOPS (10910^9), teraFLOPS (101210^{12}), and petaFLOPS (101510^{15})
  • Supercomputers are ranked by peak FLOPS (the TOP500 list uses the LINPACK benchmark to measure this)

Compare: MIPS vs. FLOPS: MIPS counts all instructions while FLOPS counts only floating-point operations. Use MIPS for general-purpose comparisons; use FLOPS when evaluating scientific or graphics workloads. Neither tells the whole story alone.

Throughput

Throughput measures the total work completed per unit time, such as tasks, transactions, or jobs processed.

  • Critical for servers and batch processing, where processing volume matters more than individual task speed
  • Can improve even if individual latency stays constant, through parallelism and pipelining
  • For example, a pipelined processor might not finish any single instruction faster, but it finishes more instructions per second overall

Efficiency Metrics

These metrics describe how well the processor uses its resources. They reveal architectural efficiency independent of clock speed.

Instructions Per Cycle (IPC)

IPC is the average number of instructions completed each clock cycle. Higher IPC means the processor is getting more useful work done per tick of the clock.

  • Varies by workload and architecture: branch-heavy code typically has lower IPC than straight-line computation because mispredicted branches waste cycles
  • Modern superscalar processors target IPC > 1 by executing multiple instructions simultaneously through techniques like out-of-order execution and multiple functional units

Cycles Per Instruction (CPI)

CPI is the average number of clock cycles needed to complete one instruction:

CPI=Totalย CyclesInstructionย Count\text{CPI} = \frac{\text{Total Cycles}}{\text{Instruction Count}}

Lower CPI is better because it means instructions execute more efficiently. CPI increases when the processor stalls, which happens due to pipeline hazards, cache misses, and complex instructions that require multiple cycles.

When a program uses a mix of instruction types, you can calculate average CPI as a weighted sum:

CPIavg=โˆ‘i(CPIiร—Fractioni)\text{CPI}_{\text{avg}} = \sum_{i} (\text{CPI}_i \times \text{Fraction}_i)

where each CPIi\text{CPI}_i is the cycle cost of instruction type ii and Fractioni\text{Fraction}_i is how often that type appears.

Compare: IPC vs. CPI: these are mathematical inverses (IPC=1CPI\text{IPC} = \frac{1}{\text{CPI}}). Use whichever makes your calculation cleaner. IPC emphasizes "instructions completed" while CPI emphasizes "cycles consumed."


Analytical Tools

These aren't raw measurements. They're frameworks for predicting and understanding performance improvements before you build anything.

Amdahl's Law

Amdahl's Law predicts the maximum speedup you can achieve by improving only part of a system:

Speedup=1(1โˆ’P)+PS\text{Speedup} = \frac{1}{(1-P) + \frac{P}{S}}

where PP is the fraction of execution time that can be improved and SS is the speedup applied to that fraction.

Here's how to apply it step by step:

  1. Identify what fraction of execution time is affected by the improvement (that's PP).
  2. Determine how much faster that portion becomes (that's SS).
  3. Plug into the formula. The (1โˆ’P)(1-P) term represents the part you can't improve, which sets a hard ceiling on overall speedup.

The key insight: the unimproved portion dominates as SS grows large. If only 90% of your code is parallelizable (P=0.9P = 0.9), then even with infinite processors (Sโ†’โˆžS \to \infty), maximum speedup is 11โˆ’0.9=10ร—\frac{1}{1 - 0.9} = 10\times. That remaining 10% serial code becomes the bottleneck. This is why Amdahl's Law is so important for optimization decisions: it tells you where improvements will actually matter.

Benchmarks (e.g., SPEC)

Benchmarks are standardized test suites that measure performance across representative workloads.

  • They enable fair comparisons by running the same tests on different systems, eliminating workload variability
  • SPEC CPU benchmarks are the industry standard, with separate suites for integer performance (SPECint) and floating-point performance (SPECfp)
  • Results are reported as ratios relative to a reference machine, so a SPECint score of 50 means 50ร— faster than the reference on integer workloads

Compare: Amdahl's Law vs. Benchmarks: Amdahl's Law is theoretical (predicts limits), while benchmarks are empirical (measure actual performance). Use Amdahl's Law to guide design decisions; use benchmarks to validate real-world results.


Quick Reference Table

ConceptBest Examples
Time-based performanceExecution Time, Latency
Rate-based throughputClock Speed, MIPS, FLOPS, Throughput
Architectural efficiencyIPC, CPI
Theoretical analysisAmdahl's Law
Empirical measurementBenchmarks (SPEC)
CPU performance equation componentsInstruction Count, CPI, Clock Rate
Parallel speedup limitsAmdahl's Law
Workload-specific metricsFLOPS (scientific), Throughput (servers)

Self-Check Questions

  1. If Processor A has a higher clock speed than Processor B but longer execution time for the same program, what metric must be worse for Processor A? (Hint: look at the CPU performance equation and think about what else could differ.)

  2. A program spends 80% of its time in parallelizable code. Using Amdahl's Law, what is the maximum possible speedup with infinite processors? (You should get 5ร—.)

  3. Compare and contrast MIPS and FLOPS. When would each metric be most appropriate for evaluating processor performance?

  4. Given the CPU performance equation, how would doubling the clock speed while also doubling the CPI affect execution time? (Work through the algebra.)

  5. Why might a processor with IPC of 2.0 running at 2 GHz outperform a processor with IPC of 0.8 running at 4 GHz? Which metrics would you calculate to prove this?