๐Ÿ’พIntro to Computer Architecture

CPU Pipeline Stages

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Understanding CPU pipeline stages is fundamental to grasping how modern processors achieve high performance. You're being tested on concepts like instruction-level parallelism, pipeline hazards, throughput vs. latency tradeoffs, and the fetch-decode-execute cycle. These stages don't exist in isolation. They work together to allow multiple instructions to be "in flight" simultaneously, which is why a 5-stage pipeline can theoretically improve throughput by up to 5x compared to single-cycle execution.

When exam questions ask about pipeline stalls, data hazards, or control hazards, they're really testing whether you understand what each stage does and what resources it needs. Don't just memorize the stage names. Know what hardware components are active at each stage, what data flows between stages, and what happens when dependencies force the pipeline to wait. That conceptual understanding will help you tackle scenarios involving hazard detection, forwarding, and branch prediction.


Instruction Preparation Stages

These first two stages focus on getting the instruction ready for execution: fetching it from memory and figuring out what it actually means. Both stages interact heavily with memory and control logic before any real computation happens.

Instruction Fetch (IF)

  • The Program Counter (PC) holds the address of the next instruction. The CPU sends this address to instruction memory (or the instruction cache) and reads back the instruction stored there.
  • The fetched instruction is placed into the Instruction Register (IR), which holds it stable while the next stage decodes it.
  • The PC increments automatically (typically by 4 bytes in a 32-bit architecture like MIPS, since each instruction is 4 bytes wide). This prepares the CPU to fetch the next sequential instruction unless a branch redirects execution elsewhere.

Instruction Decode (ID)

  • The opcode field is parsed to determine the operation type. This is where the control unit generates the control signals that configure every downstream stage (ALU operation, memory read/write enables, register write enable, etc.).
  • The register file is read to retrieve operand values specified by the source register fields (typically rsrs and rtrt in MIPS). These values are latched into the ID/EX pipeline register so the next stage can use them.
  • Sign extension happens here for immediate values, converting a 16-bit immediate to a full 32-bit value so it can be used in ALU operations.

Compare: IF vs. ID. Both happen before any computation, but IF interacts with instruction memory while ID interacts with the register file. If a question asks where a data hazard is detected, ID is your answer, since that's where register values are read and the dependency becomes visible.


Computation and Data Stages

These middle stages perform the actual work: calculating results and accessing data memory. This is where the ALU does its job and where load/store instructions interact with the memory hierarchy.

Execute (EX)

  • The ALU performs the core operation for the instruction: arithmetic (++, โˆ’-), logical (AND, OR, shift), comparison, or address calculation for memory instructions.
  • ALU inputs come from either two register values (R-type instructions) or one register value plus a sign-extended immediate (I-type instructions). A mux controlled by signals from ID selects the correct input.
  • For branch instructions, the branch target address is calculated here by adding the sign-extended, shifted offset to the PC. This calculation happens even before the CPU knows whether the branch will be taken. That's important because it means the address is ready if the branch condition (also evaluated in EX) turns out to be true.

Memory Access (MEM)

  • Data memory is accessed only for load and store instructions. All other instruction types pass through this stage doing nothing with memory. The result from EX simply flows through the EX/MEM pipeline register to the next stage.
  • For loads, the effective address computed in EX is sent to data memory, and the data at that address is read out. For stores, the address is sent along with the register value to be written.
  • Cache misses here create pipeline stalls, since the entire pipeline must wait for data to arrive from slower memory levels (L2 cache, L3, or main memory). This is one of the biggest sources of real-world performance loss.

Compare: EX vs. MEM. EX uses the ALU for computation, while MEM uses data memory for storage access. R-type instructions only need EX; load/store instructions need both. This distinction matters for understanding which hazards affect which instruction types. A load-use hazard is particularly nasty because the data isn't available until the end of MEM, one stage later than an R-type result would be.


Result Completion Stage

The final stage ensures computed results become visible to future instructions. Without this stage, no instruction would ever produce a lasting change to the register file.

Write Back (WB)

  • The destination register receives the result. A mux selects between the ALU output (for R-type and immediate operations) or the data loaded from memory (for load instructions), and that value is written into the register file.
  • The register file write port is used here. The write register number is determined by the instruction format: rdrd for R-type instructions, rtrt for loads and I-type instructions.
  • Store instructions skip this stage since they write to data memory, not to registers. Their work was completed in MEM. Similarly, branch instructions that don't link a return address have nothing to write back.

Compare: MEM vs. WB. Both can provide a "final result," but MEM provides data read from memory (loads) while WB writes any result into the register file. Understanding this split is essential for implementing data forwarding paths, because you need to know when each type of result is actually available.


Quick Reference Table

ConceptDetails
Memory interactionIF (instruction memory), MEM (data memory)
Register file accessID (read), WB (write)
ALU usageEX stage exclusively
Control signal generationID stage
Address calculationEX (for branches and memory operations)
Pipeline register boundariesIF/ID, ID/EX, EX/MEM, MEM/WB
Stages skipped by some instructionsMEM (by R-type), WB (by stores)

Self-Check Questions

  1. Which two stages interact with memory, and what type of memory does each access?

  2. If a load instruction is followed immediately by an add instruction that uses the loaded value, at which stage is the data hazard detected, and why can't forwarding alone fix it without a one-cycle stall?

  3. Compare what happens during the EX stage for an R-type arithmetic instruction versus a load word (lwlw) instruction. What's different about the ALU inputs and what the ALU result represents?

  4. A store instruction (swsw) uses the MEM stage but not the WB stage. Explain why this makes sense given what each stage does.

  5. If you were implementing data forwarding to reduce stalls, which stage boundaries would need forwarding paths, and what values would be forwarded? Think about where results first become available (end of EX, end of MEM) and where they're needed (start of EX).