📡Systems Approach to Computer Networks Unit 20 – Distributed Systems & Cloud Computing

Distributed systems and cloud computing form the backbone of modern digital infrastructure. These technologies enable scalable, fault-tolerant applications by connecting multiple computers through networks. From client-server architectures to microservices, various models address different needs in distributed computing. Cloud computing delivers on-demand resources over the internet, offering flexibility and cost-efficiency. Service models like IaaS, PaaS, and SaaS cater to diverse requirements, while deployment options include public, private, and hybrid clouds. Key concepts like virtualization and containerization underpin cloud infrastructure, enabling efficient resource utilization.

Key Concepts

  • Distributed systems consist of multiple autonomous computers that communicate through a computer network to achieve a common goal
  • Key characteristics of distributed systems include resource sharing, openness, concurrency, scalability, fault tolerance, and transparency
  • Challenges in distributed systems involve coordination, synchronization, reliability, and security due to the distributed nature of components
  • CAP theorem states that a distributed system can only provide two out of three guarantees: consistency, availability, and partition tolerance
  • Consensus algorithms (Paxos, Raft) enable multiple nodes in a distributed system to agree on a single value or state, ensuring consistency
  • Eventual consistency is a consistency model where all nodes eventually converge to the same state, allowing for higher availability and partition tolerance
  • Distributed algorithms (leader election, mutual exclusion) coordinate activities and maintain consistency across nodes in a distributed system
  • Middleware abstracts the complexities of distributed systems, providing a unified programming model and hiding low-level details from developers

Distributed System Architectures

  • Client-server architecture consists of clients requesting services from servers, with servers processing requests and returning results to clients
  • Peer-to-peer (P2P) architecture enables nodes to act as both clients and servers, allowing direct communication and resource sharing among nodes
    • P2P systems can be structured (DHTs like Chord, Kademlia) or unstructured (Gnutella, BitTorrent)
  • Three-tier architecture separates presentation, application processing, and data management into distinct layers, enhancing scalability and maintainability
  • Microservices architecture decomposes applications into loosely coupled, independently deployable services that communicate through well-defined APIs
    • Benefits of microservices include flexibility, scalability, and technology diversity
  • Event-driven architecture uses events to trigger and communicate between decoupled components, enabling high scalability and responsiveness
  • Service-oriented architecture (SOA) organizes distributed systems as interoperable services, promoting reusability and loose coupling
  • Serverless architecture relies on cloud providers to manage the underlying infrastructure, allowing developers to focus on writing and deploying code

Cloud Computing Fundamentals

  • Cloud computing delivers computing resources (servers, storage, applications) over the internet on a pay-per-use basis
  • Essential characteristics of cloud computing include on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service
  • Service models in cloud computing:
    • Infrastructure as a Service (IaaS) provides virtualized computing resources (VMs, storage, networks)
    • Platform as a Service (PaaS) offers a development and deployment environment for applications
    • Software as a Service (SaaS) delivers software applications over the internet
  • Deployment models in cloud computing:
    • Public cloud is owned and operated by a third-party provider, offering services to the general public
    • Private cloud is dedicated to a single organization, providing more control and security
    • Hybrid cloud combines public and private clouds, allowing workload portability and flexibility
  • Cloud providers (Amazon Web Services, Microsoft Azure, Google Cloud Platform) offer a wide range of services and tools for building and deploying distributed systems
  • Virtualization enables the creation of multiple virtual machines (VMs) on a single physical server, improving resource utilization and isolation
  • Containers (Docker) provide a lightweight alternative to VMs, packaging applications and their dependencies into portable units

Networking in Distributed Systems

  • Computer networks enable communication and resource sharing among nodes in a distributed system
  • Network protocols (TCP/IP, UDP) define the rules and formats for data transmission and ensure reliable communication between nodes
  • Remote procedure calls (RPC) allow nodes to invoke procedures on remote nodes as if they were local, abstracting network communication details
  • Message-oriented middleware (MOM) enables asynchronous communication between nodes using message queues and publish-subscribe models
  • Overlay networks create virtual topologies on top of physical networks, enabling efficient routing and content distribution in distributed systems
  • Content delivery networks (CDNs) distribute content across geographically dispersed servers, improving performance and availability for end-users
  • Software-defined networking (SDN) separates the control plane from the data plane, enabling centralized network management and programmability
  • Network security measures (firewalls, VPNs, encryption) protect distributed systems from unauthorized access and ensure data confidentiality and integrity

Data Management and Storage

  • Distributed databases store and manage data across multiple nodes, providing scalability, fault tolerance, and high availability
  • Replication involves creating multiple copies of data across nodes to ensure data availability and fault tolerance
    • Replication strategies include master-slave, multi-master, and peer-to-peer replication
  • Partitioning (sharding) divides data into smaller subsets and distributes them across nodes, enabling horizontal scalability
  • Distributed file systems (HDFS, GFS) store and manage large datasets across multiple nodes, providing fault tolerance and parallel processing capabilities
  • NoSQL databases (MongoDB, Cassandra) offer flexible data models and high scalability for handling unstructured and semi-structured data
  • Distributed caching (Redis, Memcached) improves performance by storing frequently accessed data in memory across multiple nodes
  • Distributed transaction processing ensures the consistency and integrity of data in the presence of concurrent access and failures
  • Data consistency models (strong consistency, eventual consistency) define the guarantees provided by a distributed storage system regarding data updates

Scalability and Load Balancing

  • Scalability refers to a system's ability to handle increased workload by adding more resources (horizontal scaling) or increasing existing resources (vertical scaling)
  • Load balancing distributes incoming requests across multiple nodes to optimize resource utilization and improve performance
    • Load balancing algorithms include round-robin, least connections, and IP hash
  • Elastic scaling automatically adjusts the number of resources based on the workload, ensuring optimal performance and cost-efficiency
  • Caching reduces the load on backend systems by storing frequently accessed data in memory, improving response times and reducing network traffic
  • Content delivery networks (CDNs) distribute content across geographically dispersed servers, reducing latency and improving scalability for content delivery
  • Distributed message queues (Kafka, RabbitMQ) decouple producers and consumers, enabling asynchronous processing and improving scalability
  • Autoscaling automatically adjusts the number of instances based on predefined metrics (CPU utilization, request rate) to handle varying workloads
  • Capacity planning involves estimating future resource requirements and ensuring that the system can handle the expected workload

Security and Privacy Concerns

  • Authentication verifies the identity of users or nodes in a distributed system, ensuring that only authorized entities can access resources
  • Authorization controls access to resources based on the authenticated identity, enforcing access control policies
  • Encryption protects data confidentiality by converting plaintext into ciphertext, preventing unauthorized access to sensitive information
    • Symmetric encryption uses the same key for encryption and decryption (AES, DES)
    • Asymmetric encryption uses a pair of keys: a public key for encryption and a private key for decryption (RSA, ECC)
  • Secure communication protocols (HTTPS, SSL/TLS) establish secure channels for data transmission, protecting against eavesdropping and tampering
  • Distributed denial-of-service (DDoS) attacks overwhelm a system with a flood of traffic, disrupting its availability
    • DDoS mitigation techniques include traffic filtering, rate limiting, and using content delivery networks (CDNs)
  • Data privacy regulations (GDPR, CCPA) impose requirements on the collection, storage, and processing of personal data in distributed systems
  • Secure key management involves the generation, distribution, storage, and rotation of cryptographic keys used for encryption and authentication
  • Auditing and logging mechanisms track system activities and detect security breaches or unauthorized access attempts

Real-World Applications and Case Studies

  • Distributed databases (Cassandra, MongoDB) are used by companies like Netflix and eBay to store and manage large-scale data across multiple nodes
  • Blockchain technology (Bitcoin, Ethereum) enables decentralized applications (DApps) and secure, transparent transactions without intermediaries
  • Distributed file systems (HDFS) and processing frameworks (MapReduce, Spark) power big data analytics at companies like Facebook and Uber
  • Content delivery networks (Akamai, Cloudflare) optimize content delivery for streaming platforms (YouTube, Netflix) and e-commerce websites (Amazon)
  • Microservices architecture is adopted by companies like Amazon, Netflix, and Spotify to build scalable and maintainable applications
  • Peer-to-peer networks (BitTorrent, IPFS) enable efficient file sharing and content distribution without central servers
  • Edge computing brings computation and data storage closer to the source of data, enabling low-latency applications like autonomous vehicles and IoT devices
  • Distributed machine learning frameworks (TensorFlow, PyTorch) allow training and deployment of machine learning models across multiple nodes


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.