💾Intro to Computer Architecture Unit 1 – Computer Architecture: Core Concepts

Computer architecture forms the foundation of modern computing systems, encompassing the design and organization of hardware and software components. This unit explores key elements like CPUs, memory systems, and I/O devices, as well as fundamental concepts such as data representation and instruction set architectures. The study delves into CPU design, memory hierarchies, and performance optimization techniques. It also covers advanced topics like multiprocessor systems, GPU architectures, and emerging technologies such as quantum computing and neuromorphic systems, providing a comprehensive overview of this rapidly evolving field.

Study Guides for Unit 1

1.1

Overview of computer architecture and organization

5 min read

1.2

Von Neumann architecture and its components

7 min read

1.3

Computer abstractions and technology trends

5 min read

Key Components of Computer Architecture

Encompasses the design and organization of a computer system's hardware and software components
Includes the CPU (Central Processing Unit) which executes instructions and performs arithmetic and logical operations
Incorporates memory systems for storing data and instructions (RAM, cache, hard drives)
Features input/output (I/O) devices for interacting with the external environment (keyboard, mouse, display)
Utilizes buses for communication and data transfer between components
- Address bus carries memory addresses
- Data bus transfers data between components
- Control bus carries control signals and synchronizes operations
Defines the instruction set architecture (ISA) specifying the machine language instructions supported by the processor
Focuses on optimizing performance, power efficiency, and cost-effectiveness in computer system design

Data Representation and Storage

Computers represent and store data using the binary number system (0s and 1s)
Each binary digit (bit) represents the smallest unit of data
Bits are grouped into larger units called bytes (8 bits) or words (typically 32 or 64 bits)
Numeric data is represented using fixed-point or floating-point formats
- Fixed-point represents integers and fractions with a fixed number of bits for each part
- Floating-point represents real numbers using a mantissa and exponent (IEEE 754 standard)
Character data is encoded using ASCII (American Standard Code for Information Interchange) or Unicode standards
Images are represented using pixel grids with color values (RGB, CMYK) or compressed formats (JPEG, PNG)
Data is stored in memory cells or on secondary storage devices (hard drives, SSDs)
Memory is organized into addressable locations, each with a unique memory address

Instruction Set Architecture (ISA)

Defines the interface between hardware and software in a computer system
Specifies the set of machine language instructions supported by the processor
Includes instruction formats, addressing modes, data types, and registers
CISC (Complex Instruction Set Computing) architectures have a large number of complex instructions
- Examples: x86 (Intel), 68000 (Motorola)
RISC (Reduced Instruction Set Computing) architectures have a smaller set of simpler instructions
- Examples: ARM, MIPS, SPARC
Instructions are fetched from memory, decoded, executed, and results are stored back in memory or registers
Assembly language provides a human-readable representation of machine language instructions
Compilers translate high-level programming languages into machine language instructions

CPU Design and Organization

The CPU is the brain of the computer, responsible for executing instructions and performing computations
Consists of the arithmetic logic unit (ALU), control unit, registers, and cache memory
Fetches instructions from memory, decodes them, executes operations, and stores results
Pipelining improves performance by overlapping the execution of multiple instructions
- Stages: instruction fetch, decode, execute, memory access, write back
Superscalar architectures execute multiple instructions simultaneously using multiple execution units
Out-of-order execution reorders instructions to maximize resource utilization and minimize dependencies
Branch prediction techniques (static, dynamic) optimize the execution of conditional branches
Multi-core processors integrate multiple CPU cores on a single chip for parallel processing

Memory Hierarchy and Management

Memory hierarchy organizes storage devices based on speed, capacity, and cost
- Registers, cache, main memory (RAM), secondary storage (hard drives, SSDs)
Registers are the fastest and most expensive, located within the CPU
Cache memory (L1, L2, L3) stores frequently accessed data and instructions to reduce memory access latency
- Temporal locality: recently accessed data is likely to be accessed again
- Spatial locality: nearby memory locations are likely to be accessed together
Main memory (RAM) stores active programs and data, accessed by the CPU through the memory controller
Virtual memory allows the use of secondary storage (hard drive) as an extension of main memory
- Paging divides memory into fixed-size pages, swapped between main memory and secondary storage
- Segmentation divides memory into variable-size segments based on logical divisions of a program
Memory management unit (MMU) handles address translation and memory protection

Input/Output Systems

I/O systems enable communication between the computer and external devices
Includes input devices (keyboard, mouse, touchscreen) and output devices (display, printer, speakers)
I/O controllers manage the transfer of data between the CPU, memory, and I/O devices
- Examples: USB controller, graphics card, network interface card (NIC)
Interrupts allow I/O devices to signal the CPU when they require attention
- Interrupt handler routines process the interrupts and perform necessary actions
Direct memory access (DMA) enables I/O devices to access memory directly, bypassing the CPU
Buses (PCIe, USB, SATA) provide standardized interfaces for connecting I/O devices to the system
Device drivers are software components that facilitate communication between the operating system and I/O devices

Performance Metrics and Optimization

Performance metrics evaluate the efficiency and speed of a computer system
Clock speed (measured in Hz) determines the number of clock cycles per second
Instruction per cycle (IPC) measures the average number of instructions executed per clock cycle
Execution time is the total time taken to complete a task, influenced by clock speed and IPC
Throughput represents the number of tasks completed per unit of time
Latency is the delay between the initiation of an operation and its completion
Amdahl's Law states that the performance improvement of a system is limited by its sequential (non-parallelizable) parts
Optimization techniques include:
- Instruction-level parallelism (ILP) exploits parallelism within a single instruction stream
- Data-level parallelism (DLP) performs the same operation on multiple data elements simultaneously
- Thread-level parallelism (TLP) executes multiple threads concurrently on different processor cores
- Compiler optimizations (loop unrolling, code vectorization) improve code efficiency
- Cache optimization (blocking, prefetching) reduces memory access latency

Advanced Topics and Future Trends

Multiprocessor systems integrate multiple processors on a single system for parallel processing
- Shared memory multiprocessors share a common memory address space
- Distributed memory multiprocessors have separate memory for each processor
GPU (Graphics Processing Unit) architectures are optimized for parallel processing of graphics and general-purpose computations
Heterogeneous computing combines different types of processors (CPU, GPU, FPGA) to leverage their unique strengths
Neuromorphic computing mimics the structure and function of biological neural networks for energy-efficient and adaptive computing
Quantum computing utilizes quantum bits (qubits) and quantum operations for solving certain complex problems exponentially faster than classical computers
Near-memory and in-memory computing architectures place computation closer to memory to reduce data movement and improve performance
3D chip stacking technologies (e.g., through-silicon vias) enable vertical integration of multiple chip layers for increased density and bandwidth
Emerging non-volatile memory technologies (PCM, MRAM, ReRAM) offer higher density, lower power consumption, and persistent storage compared to traditional DRAM and NAND flash