Fiveable

💾Intro to Computer Architecture Unit 8 Review

QR code for Intro to Computer Architecture practice questions

8.2 Amdahl's Law and speedup analysis

8.2 Amdahl's Law and speedup analysis

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
💾Intro to Computer Architecture
Unit & Topic Study Guides

Amdahl's Law predicts how much faster a program can run when you spread its work across multiple processors. It reveals a fundamental constraint: the serial (non-parallelizable) portion of a program puts a hard ceiling on speedup, no matter how many processors you throw at it. Understanding this law helps you decide whether adding processors is worthwhile or whether your time is better spent optimizing the serial code.

Amdahl's Law in Parallel Computing

Concept and Implications

Amdahl's Law gives you a formula for the theoretical maximum speedup of a program running on multiple processors. The core idea is that every program has two parts: a parallelizable fraction (work that can be split across processors) and a serial fraction (work that must run on a single processor, one step at a time).

The serial fraction is the bottleneck. As you add more processors, the parallel portion gets faster, but the serial portion stays the same. This means speedup hits a wall determined entirely by how large the serial fraction is.

  • If 10% of a program is serial, the maximum theoretical speedup is 1/0.10=10×1 / 0.10 = 10\times, even with infinite processors.
  • If 50% is serial, the ceiling drops to just 2×2\times.
  • Diminishing returns kick in quickly: each additional processor contributes less speedup than the one before it.

This has a practical consequence that's easy to overlook. Shrinking the serial portion often matters more than adding processors. For example, reducing the serial fraction from 20% to 10% doubles your maximum theoretical speedup (from 5×5\times to 10×10\times).

Speedup Calculation Formula

The formula is:

S(n)=1(1P)+PnS(n) = \frac{1}{(1 - P) + \frac{P}{n}}

where:

  • PP = the fraction of the program that can be parallelized
  • (1P)(1 - P) = the serial fraction
  • nn = the number of processors

Example: If 75% of a program is parallelizable (P=0.75P = 0.75) and you run it on 4 processors:

S(4)=1(10.75)+0.754=10.25+0.1875=10.43752.29S(4) = \frac{1}{(1 - 0.75) + \frac{0.75}{4}} = \frac{1}{0.25 + 0.1875} = \frac{1}{0.4375} \approx 2.29

Notice that even though you quadrupled the processors, you didn't get anywhere close to a 4×4\times speedup. That gap is entirely due to the 25% serial portion.

Theoretical Speedup Calculation

Concept and Implications, Amdahl's Law | Introduction to High Performance and High Throughput Computing

Steps to Calculate Speedup

  1. Identify the parallelizable fraction (PP). The serial fraction is (1P)(1 - P).

  2. Determine the number of processors (nn) you'll use.

  3. Plug into the formula: S(n)=1(1P)+PnS(n) = \frac{1}{(1 - P) + \frac{P}{n}}

  4. Compute the result by evaluating the denominator first, then dividing 1 by it.

Worked example: A program has 60% parallelizable code (P=0.6P = 0.6) and runs on 8 processors.

  • Serial fraction: 10.6=0.41 - 0.6 = 0.4
  • Parallel portion per processor: 0.6/8=0.0750.6 / 8 = 0.075
  • Denominator: 0.4+0.075=0.4750.4 + 0.075 = 0.475
  • Speedup: S(8)=1/0.4752.11S(8) = 1 / 0.475 \approx 2.11

Even with 8 processors, the 40% serial fraction keeps speedup just above 2×2\times.

Impact of Parallelizable Fraction and Processor Count

The parallelizable fraction PP has a much larger effect on speedup than the processor count nn. Here's a comparison:

PPn=4n = 4n=16n = 16n=n = \infty (ceiling)
0.501.601.882.00
0.802.503.335.00
0.903.085.9310.00
0.953.487.8020.00

Two things stand out in this table. First, going from P=0.50P = 0.50 to P=0.90P = 0.90 at 16 processors triples your speedup. Second, going from 4 to 16 processors at P=0.80P = 0.80 only gains you about 0.83×0.83\times more speedup despite quadrupling hardware. The returns from adding processors diminish fast, especially when the serial portion is large.

Amdahl's Law Limitations

Concept and Implications, Amdahl's law - Wikipedia

Assumptions and Real-World Constraints

Amdahl's Law is a useful model, but it makes several simplifying assumptions that don't always hold in practice.

  • Fixed problem size. The law assumes you're solving the same-sized problem with more processors. In many real applications, you scale the problem up as you add processors (called weak scaling). Gustafson's Law addresses this alternative scenario and often gives a more optimistic view of parallel speedup.
  • No parallelization overhead. The formula ignores the cost of coordinating processors: communication, synchronization, thread creation, and data sharing. These overheads grow as you add processors, so actual speedup is typically lower than what Amdahl's Law predicts.
  • Perfect load balancing. The law assumes work divides evenly across all processors. In practice, dependencies between tasks or uneven data partitions can leave some processors idle while others are still working, which wastes resources and reduces speedup.

Other Factors Affecting Parallel Performance

Several real-world factors fall outside the scope of Amdahl's Law:

  • Memory hierarchy and data locality. When processors need to access shared memory or data stored on remote nodes, latency and bandwidth limitations can slow things down significantly. Keeping data close to the processor that needs it (locality optimization) is critical.
  • Resource contention. Parallel systems often run multiple programs at once. Processors competing for shared cache, memory bandwidth, or interconnect capacity can degrade each program's performance.
  • The serial fraction isn't always fixed. Some code that looks serial can sometimes be restructured. Techniques like pipelining, speculative execution, or algorithmic redesign can shrink the serial portion, pushing the speedup ceiling higher than the original analysis suggested.

Speedup Analysis Techniques

Measuring and Comparing Speedup

To evaluate a real parallel program, you compare measured performance against the theoretical prediction.

  1. Measure sequential execution time (TseqT_{seq}) by running the program on a single processor.
  2. Measure parallel execution time (TparT_{par}) on your target number of processors.
  3. Calculate actual speedup: Sactual=TseqTparS_{actual} = \frac{T_{seq}}{T_{par}}

For example, if the sequential run takes 100 seconds and the 4-processor run takes 30 seconds, the actual speedup is 100/303.33100 / 30 \approx 3.33.

Now compare SactualS_{actual} to the theoretical S(n)S(n) from Amdahl's Law. If actual speedup falls well below the theoretical value, something is eating into your performance: overhead, load imbalance, or memory bottlenecks are the usual suspects.

Profiling and Optimization Strategies

Scalability analysis helps you understand how your program behaves as you change resources:

  • Strong scaling: Keep the problem size fixed and increase the number of processors. You're testing how well the program parallelizes at that size.
  • Weak scaling: Increase both the problem size and processor count proportionally. You're testing whether the program maintains performance as the workload grows.

Profiling tools like Intel VTune, GNU gprof, or OpenMP profiling interfaces help you pinpoint where time is actually being spent. Focus your optimization on the sections that consume the most time, since small improvements to hot spots yield the biggest gains.

Parallelization strategies to consider:

  • Data parallelism: Partition the data and apply the same operation to each partition simultaneously. Works well when the same computation repeats across a large dataset.
  • Task parallelism: Distribute independent tasks across processors. Best when the program has many distinct operations that don't depend on each other.
  • Pipeline parallelism: Split computation into stages and feed data through them concurrently, similar to an assembly line. Useful when processing a stream of inputs through multiple steps.

Choosing the right strategy depends on the structure of your problem and the architecture you're targeting. Often, the best approach involves combining strategies for different parts of the program.