Advanced Computer Architecture

🥸Advanced Computer Architecture Unit 1 – Intro to Advanced Computer Architecture

Advanced Computer Architecture explores the design and optimization of computer systems, focusing on processor design, memory hierarchy, and parallel processing techniques. It delves into instruction set architectures, pipelining, and cache organization to enhance performance and efficiency. This field examines how to maximize instruction-level parallelism, implement multi-core architectures, and evaluate system performance. It also considers the impact of technology trends on computer design, balancing performance, power consumption, and reliability in modern systems.

Key Concepts and Foundations

  • Computer architecture encompasses the design, organization, and implementation of computer systems
  • Focuses on the interface between hardware and software, optimizing performance, power efficiency, and reliability
  • Includes the study of instruction set architectures (ISAs), which define the basic operations a processor can execute
  • Explores the organization of processor components such as arithmetic logic units (ALUs), control units, and registers
  • Examines memory hierarchy, including cache, main memory, and secondary storage, to optimize data access and minimize latency
  • Investigates parallel processing techniques, such as pipelining and multi-core architectures, to enhance performance
  • Considers the impact of technology trends, such as Moore's Law and the power wall, on computer architecture design

Processor Design Principles

  • Processors are designed to execute instructions efficiently and quickly, maximizing performance while minimizing power consumption
  • RISC (Reduced Instruction Set Computing) architectures emphasize simple, fixed-length instructions that can be executed in a single cycle
    • Examples of RISC architectures include ARM and MIPS
  • CISC (Complex Instruction Set Computing) architectures support more complex, variable-length instructions that may require multiple cycles to execute
    • x86 is a well-known example of a CISC architecture
  • Pipelining is a technique that overlaps the execution of multiple instructions, allowing the processor to begin executing a new instruction before the previous one has completed
  • Superscalar architectures enable the execution of multiple instructions simultaneously by duplicating functional units (such as ALUs)
  • Out-of-order execution allows instructions to be executed in a different order than they appear in the program, based on data dependencies and resource availability
  • Branch prediction techniques, such as static and dynamic prediction, aim to minimize the impact of control hazards caused by conditional branching instructions

Memory Hierarchy and Management

  • Memory hierarchy organizes storage devices based on their capacity, speed, and cost, with faster and more expensive memory closer to the processor
  • Registers are the fastest and most expensive memory, located within the processor and used for temporary storage of operands and results
  • Cache memory is a small, fast memory between the processor and main memory, designed to store frequently accessed data and instructions
    • Caches are organized into levels (L1, L2, L3) with increasing capacity and latency
  • Main memory (RAM) is larger and slower than cache, storing the active portions of programs and data
  • Secondary storage (hard drives, SSDs) has the largest capacity but the slowest access times, used for long-term storage of programs and data
  • Virtual memory techniques, such as paging and segmentation, allow the operating system to manage memory by providing a logical address space larger than the physical memory
  • Memory management units (MMUs) translate logical addresses to physical addresses and handle memory protection and allocation

Instruction-Level Parallelism

  • Instruction-level parallelism (ILP) refers to the ability to execute multiple instructions simultaneously within a single processor core
  • ILP can be exploited through techniques such as pipelining, superscalar execution, and out-of-order execution
  • Data dependencies, such as true dependencies (read-after-write) and anti-dependencies (write-after-read), can limit the amount of ILP that can be achieved
  • Instruction scheduling techniques, such as scoreboarding and Tomasulo's algorithm, aim to maximize ILP by reordering instructions based on their dependencies
  • Register renaming eliminates false dependencies (write-after-write) by mapping architectural registers to a larger set of physical registers
  • Speculative execution allows the processor to execute instructions before it is certain that they will be needed, based on branch predictions
  • Very Long Instruction Word (VLIW) architectures explicitly specify the parallelism in the instruction stream, placing the burden of scheduling on the compiler

Pipelining and Superscalar Architectures

  • Pipelining divides instruction execution into stages (fetch, decode, execute, memory access, write-back), allowing multiple instructions to be in different stages simultaneously
  • Pipeline hazards, such as structural hazards (resource conflicts), data hazards (dependencies), and control hazards (branches), can stall the pipeline and reduce performance
  • Forwarding (bypassing) is a technique used to mitigate data hazards by passing results directly between pipeline stages, avoiding the need to wait for them to be written back to registers
  • Superscalar architectures issue multiple instructions per cycle to multiple functional units, exploiting instruction-level parallelism
  • Dynamic scheduling techniques, such as Tomasulo's algorithm and the Reorder Buffer (ROB), enable out-of-order execution in superscalar processors
  • Branch prediction and speculative execution are crucial for maintaining high performance in pipelined and superscalar architectures
  • Deeply pipelined architectures have a larger number of stages, allowing for higher clock frequencies but increasing the impact of hazards and branch mispredictions

Cache Organization and Optimization

  • Caches are organized into lines (blocks), each containing multiple words of data or instructions
  • Cache mapping policies determine how memory addresses are mapped to cache lines:
    • Direct-mapped caches map each memory address to a single cache line
    • Set-associative caches map each memory address to a set of cache lines, allowing for more flexibility and reduced conflicts
    • Fully-associative caches allow any memory address to be mapped to any cache line, providing the most flexibility but requiring more complex hardware
  • Cache replacement policies, such as Least Recently Used (LRU) and Random, determine which cache line to evict when a new line needs to be brought in
  • Write policies determine how writes to the cache are handled:
    • Write-through caches immediately update both the cache and main memory on a write
    • Write-back caches update only the cache on a write and mark the line as dirty, writing it back to main memory only when the line is evicted
  • Cache coherence protocols, such as MESI and MOESI, ensure that multiple copies of data in different caches remain consistent in multi-core and multi-processor systems
  • Cache optimization techniques, such as prefetching and victim caches, aim to reduce cache misses and improve performance

Multi-core and Parallel Processing

  • Multi-core processors integrate multiple processor cores on a single chip, allowing for thread-level parallelism (TLP)
  • Symmetric multiprocessing (SMP) architectures provide each core with equal access to shared memory and resources
  • Non-Uniform Memory Access (NUMA) architectures have memory physically distributed among the cores, with varying access latencies depending on the memory location
  • Shared memory programming models, such as OpenMP and Pthreads, allow developers to express parallelism using threads that communicate through shared variables
  • Message passing programming models, such as MPI, enable parallelism by having processes communicate through explicit messages
  • Synchronization primitives, such as locks, semaphores, and barriers, are used to coordinate access to shared resources and ensure correct parallel execution
  • Cache coherence and memory consistency models, such as sequential consistency and relaxed consistency, define the allowable orderings of memory operations in parallel systems
  • Heterogeneous computing architectures, such as GPUs and FPGAs, offer specialized hardware for parallel processing of specific workloads

Performance Metrics and Evaluation

  • Performance metrics quantify the efficiency and effectiveness of computer architectures in executing programs
  • Execution time measures the total time required to complete a program, including computation, memory accesses, and I/O operations
  • Throughput represents the number of tasks or instructions completed per unit of time (e.g., instructions per second)
  • Latency refers to the time delay between the initiation of an operation and its completion, such as the time to access memory or execute an instruction
  • Speedup compares the performance of an optimized or parallel implementation to a baseline, sequential implementation: Speedup=ExecutionTimesequentialExecutionTimeparallelSpeedup = \frac{ExecutionTime_{sequential}}{ExecutionTime_{parallel}}
  • Amdahl's Law describes the maximum speedup achievable through parallelization, based on the fraction of the program that can be parallelized: Speedup1(1f)+fNSpeedup \leq \frac{1}{(1-f)+\frac{f}{N}}, where ff is the parallel fraction and NN is the number of processors
  • Scalability refers to the ability of a system to maintain performance as the problem size or the number of processing elements increases
  • Benchmarks, such as SPEC CPU and PARSEC, provide standardized workloads and metrics for evaluating and comparing the performance of different computer architectures


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.