🥸Advanced Computer Architecture Unit 7 – Memory Hierarchy and Cache Design

Memory hierarchy and cache design are crucial components of modern computer architecture. They address the speed gap between processors and main memory by organizing storage devices based on speed, capacity, and cost. This approach exploits the principle of locality to provide the illusion of a large, fast, and inexpensive memory system. Caches, small fast memories close to the processor, play a key role in this hierarchy. They store frequently accessed data, reducing average access time. Cache design involves balancing size, associativity, and replacement policies to optimize performance, considering factors like hit rate, miss rate, and average memory access time.

Memory Hierarchy Basics

  • Memory hierarchy organizes storage devices based on their speed, capacity, and cost
  • Faster memory technologies (registers, cache) are located closer to the processor for quick access
  • Slower memory technologies (main memory, secondary storage) have higher capacities and are further from the processor
  • Principle of locality exploits the idea that recently accessed data and nearby data are likely to be accessed again soon
    • Temporal locality refers to the reuse of specific data within a relatively small time duration
    • Spatial locality refers to the use of data elements within relatively close storage locations
  • Memory hierarchy aims to provide the illusion of a large, fast, and inexpensive memory system
  • Hit rate represents the percentage of memory accesses satisfied by a particular level of the memory hierarchy
  • Miss rate is the percentage of memory accesses that are not satisfied by a particular level and require accessing a lower level

Cache Fundamentals

  • Cache is a small, fast memory located close to the processor that stores frequently accessed data
  • Acts as a buffer between the processor and main memory to reduce the average access time
  • Exploits the principle of locality to improve performance
  • Cache hit occurs when the requested data is found in the cache
  • Cache miss occurs when the requested data is not found in the cache and must be fetched from a lower level of the memory hierarchy
  • Cache size, block size, associativity, and replacement policy are key design parameters that affect cache performance
  • Larger cache sizes can store more data but have higher access latency and power consumption
  • Smaller block sizes reduce the amount of data transferred per cache miss but may increase the miss rate

Cache Design Principles

  • Principle of locality is the foundation for cache design and optimization
  • Temporal locality suggests keeping recently accessed data in the cache for future use
  • Spatial locality suggests bringing nearby data into the cache along with the requested data
  • Larger cache sizes can capture more locality but have diminishing returns and increased cost and complexity
  • Higher associativity allows more flexible placement of data in the cache but increases the hit time and power consumption
  • Write policies (write-through, write-back) determine how the cache handles write operations and affects performance and data consistency
  • Block size balances the amount of data transferred per cache miss with the potential for increased miss rate
  • Replacement policies (LRU, random) determine which cache block to evict when a miss occurs and a new block needs to be brought in

Cache Mapping Techniques

  • Cache mapping determines how memory addresses are mapped to cache locations
  • Direct mapping maps each memory block to a specific cache location based on its address
    • Simple and fast but can lead to conflicts and underutilization of cache space
  • Fully associative mapping allows any memory block to be placed in any cache location
    • Flexible but requires complex and expensive hardware for tag comparison
  • Set-associative mapping is a compromise between direct and fully associative mapping
    • Memory blocks are mapped to a specific set based on their address but can be placed anywhere within that set
    • Provides a balance between flexibility and hardware complexity
  • N-way set-associative mapping divides the cache into N sets, each containing multiple blocks
    • Increases the chances of finding a free block for a new memory block
  • Skewed associative mapping uses different hash functions for each way to reduce conflicts and improve performance

Replacement and Write Policies

  • Replacement policies determine which cache block to evict when a miss occurs and a new block needs to be brought in
  • Least Recently Used (LRU) policy replaces the block that has been accessed least recently
    • Exploits temporal locality by keeping recently accessed blocks in the cache
  • Random replacement policy randomly selects a block to evict
    • Simple to implement but may not perform as well as LRU
  • First-In-First-Out (FIFO) policy replaces the block that was brought into the cache earliest
    • Easy to implement but does not consider the recency of access
  • Write policies determine how the cache handles write operations
  • Write-through policy updates both the cache and main memory on every write operation
    • Ensures data consistency but may increase memory traffic and latency
  • Write-back policy updates only the cache on a write operation and marks the block as dirty
    • Reduces memory traffic but requires writing back dirty blocks to main memory when evicted
  • Write-allocate policy brings a block into the cache on a write miss
    • Ensures future reads to the block can be served from the cache
  • No-write-allocate policy does not bring a block into the cache on a write miss
    • Avoids polluting the cache with blocks that may not be read again

Cache Performance Metrics

  • Hit rate is the percentage of memory accesses that are satisfied by the cache
    • Higher hit rates indicate better cache performance and reduced access latency
  • Miss rate is the percentage of memory accesses that are not satisfied by the cache and require accessing a lower level of the memory hierarchy
    • Lower miss rates indicate better cache performance and fewer long-latency memory accesses
  • Average memory access time (AMAT) is the average time to access memory considering both cache hits and misses
    • Calculated as: AMAT = Hit time + (Miss rate × Miss penalty)
  • Miss penalty is the additional time required to fetch data from a lower level of the memory hierarchy on a cache miss
    • Depends on the latency and bandwidth of the lower level memory
  • Cache throughput measures the number of memory accesses that can be processed by the cache per unit of time
    • Affected by cache size, associativity, and access latency
  • Cache efficiency is the ratio of the number of useful memory accesses to the total number of memory accesses
    • Measures the effectiveness of the cache in satisfying memory requests

Advanced Cache Architectures

  • Multi-level cache hierarchies use multiple levels of caches (L1, L2, L3) to balance performance and capacity
    • Lower-level caches (L1) are smaller and faster, while higher-level caches (L2, L3) are larger and slower
  • Inclusive cache hierarchies enforce the inclusion property, where all data in a lower-level cache must also be present in higher-level caches
    • Simplifies cache coherence and data consistency but may lead to duplication of data
  • Exclusive cache hierarchies do not allow data to be present in multiple levels of the cache hierarchy simultaneously
    • Increases the effective cache capacity but may increase the miss rate and data movement between levels
  • Non-inclusive cache hierarchies do not enforce any inclusion property and allow data to be present in any level of the cache hierarchy
    • Provides flexibility but may complicate cache coherence and data consistency
  • Victim cache is a small, fully associative cache that stores recently evicted blocks from the main cache
    • Reduces the miss penalty by providing a second chance for recently evicted blocks
  • Trace cache is a specialized cache that stores decoded instructions in the order they were executed
    • Improves instruction fetch performance and reduces the impact of control flow changes
  • Prefetching techniques proactively bring data into the cache before it is requested by the processor
    • Hardware prefetching uses access patterns and heuristics to predict future memory accesses
    • Software prefetching uses explicit instructions to initiate data transfer from memory to cache

Memory Hierarchy Optimization

  • Cache optimization techniques aim to improve cache performance and reduce the impact of memory latency
  • Cache blocking (tiling) reorganizes data access patterns to maximize spatial locality and minimize cache misses
    • Divides large data structures into smaller blocks that fit in the cache
  • Loop interchange changes the order of nested loops to improve spatial locality and reduce cache misses
    • Accesses data in the order that maximizes cache utilization
  • Data prefetching brings data into the cache before it is needed, hiding memory latency
    • Hardware prefetching uses access patterns and heuristics to predict future memory accesses
    • Software prefetching inserts explicit prefetch instructions to initiate data transfer
  • Cache-conscious data structures are designed to minimize cache misses and maximize cache utilization
    • Examples include cache-oblivious algorithms, cache-aware trees, and cache-friendly layouts
  • Memory access scheduling reorders memory instructions to hide latency and improve performance
    • Out-of-order execution allows the processor to execute independent instructions while waiting for memory accesses
  • Cache compression techniques reduce the size of cached data to increase the effective cache capacity
    • Compressed caches store compressed data blocks and decompress them on cache hits
  • Cache partitioning allocates cache resources to different applications or threads to minimize interference and improve performance isolation
    • Prevents one application from monopolizing the cache and affecting the performance of others
  • Cache bypassing selectively bypasses the cache for certain memory accesses to avoid polluting the cache with data that is not reused
    • Avoids bringing in data that is only used once or has poor temporal locality


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.