All Study Guides Intro to Database Systems Unit 1
💾 Intro to Database Systems Unit 1 – Introduction to Database SystemsDatabases are the backbone of modern information systems, organizing and managing vast amounts of data efficiently. This unit introduces key concepts like database models, SQL, and ACID properties, laying the foundation for understanding how data is structured, queried, and maintained in various applications.
From relational databases to query optimization, this unit covers essential topics for designing and working with databases. Students will learn about normalization, indexing, and transaction management, gaining practical skills for creating robust and performant database systems in real-world scenarios.
Key Concepts and Terminology
Database stores and organizes data in a structured format for efficient retrieval and manipulation
DBMS (Database Management System) software system that manages databases, provides interfaces for users and applications to interact with the database
Schema defines the structure, organization, and constraints of a database
Data model abstract model that organizes data elements, standardizes how the data elements relate to one another (relational, hierarchical, network)
SQL (Structured Query Language) standardized language for managing and querying relational databases
DDL (Data Definition Language) subset of SQL used to define and modify database schema
DML (Data Manipulation Language) subset of SQL used to insert, update, and delete data in a database
ACID (Atomicity, Consistency, Isolation, Durability) set of properties that guarantee reliable processing of database transactions
Normalization process of organizing data in a database to minimize redundancy and dependency
Database Models and Architecture
Relational model organizes data into tables (relations) consisting of rows (tuples) and columns (attributes)
Tables are related to each other through common attributes (keys)
Ensures data integrity and reduces data redundancy
Hierarchical model organizes data in a tree-like structure with parent-child relationships
Each child record has only one parent record
Suitable for representing data with inherent hierarchical relationships (organizational structure)
Network model extension of the hierarchical model that allows a child record to have multiple parent records
Represents complex relationships between data elements
Navigational approach to accessing data
Object-oriented model represents data as objects, which encapsulate data and behavior
Supports inheritance, polymorphism, and encapsulation
Suitable for complex applications with rich data types and relationships
Client-server architecture separates the database server from the client applications
Server manages the database and processes client requests
Clients interact with the server through a network connection
Three-tier architecture adds a middle tier (application server) between the client and the database server
Application server handles business logic and data processing
Improves scalability, security, and maintainability
Relational Database Fundamentals
Table (relation) fundamental building block of a relational database
Consists of rows (tuples) and columns (attributes)
Each row represents a unique instance of an entity
Each column represents a specific attribute or characteristic of the entity
Primary key unique identifier for each row in a table
Can be a single column or a combination of columns
Ensures data integrity and facilitates relationships between tables
Foreign key column in a table that references the primary key of another table
Establishes a relationship between two tables
Maintains referential integrity
Relationship association between two or more tables based on common attributes
One-to-one each row in one table is related to at most one row in another table
One-to-many each row in one table can be related to multiple rows in another table
Many-to-many multiple rows in one table can be related to multiple rows in another table
Join operation that combines rows from two or more tables based on a related column
Inner join returns only the rows that have matching values in both tables
Outer join (left, right, full) returns all rows from one table and the matching rows from the other table
SQL Basics and Querying
SELECT retrieves data from one or more tables
WHERE filters rows based on a specified condition
JOIN combines rows from two or more tables based on a related column
GROUP BY groups rows that have the same values in specified columns
HAVING filters groups based on a specified condition
ORDER BY sorts the result set based on specified columns
Database Design and Normalization
Conceptual design high-level design that identifies entities, attributes, and relationships
Entity-Relationship (ER) model graphical representation of entities, attributes, and relationships
Entities objects or concepts that are relevant to the database (student, course)
Attributes characteristics or properties of an entity (name, ID, email)
Relationships associations between entities (student enrolls in a course)
Logical design translates the conceptual design into a database schema
Mapping ER model to relational model
Identifying tables, columns, primary keys, and foreign keys
Applying normalization techniques to minimize data redundancy and dependency
Normalization process of organizing data in a database to minimize redundancy and dependency
First Normal Form (1NF) each column contains atomic values, no repeating groups
Second Normal Form (2NF) 1NF and no partial dependencies (non-prime attributes depend on the entire primary key)
Third Normal Form (3NF) 2NF and no transitive dependencies (non-prime attributes do not depend on other non-prime attributes)
Denormalization intentional introduction of redundancy to improve query performance
Trade-off between data integrity and query efficiency
Used in read-heavy databases or when complex joins are required
Data Integrity and Constraints
Data integrity accuracy, consistency, and validity of data in a database
Domain integrity ensures that the values in a column are within a defined set or range
Entity integrity ensures that each row in a table is uniquely identified by a primary key
Referential integrity ensures that the relationships between tables are consistent and valid
Constraints rules that enforce data integrity in a database
Primary key constraint ensures that the values in a column or a set of columns are unique and not null
Foreign key constraint ensures that the values in a column or a set of columns match the values in the primary key of another table
Unique constraint ensures that the values in a column or a set of columns are unique
Check constraint ensures that the values in a column satisfy a specified condition
Not null constraint ensures that a column does not contain null values
Cascading actions automatic propagation of changes in a parent table to the related child tables
ON DELETE CASCADE deletes the corresponding rows in the child table when a row in the parent table is deleted
ON UPDATE CASCADE updates the corresponding rows in the child table when a row in the parent table is updated
Triggers special types of stored procedures that are automatically executed in response to specific database events
Used to enforce complex business rules or maintain data consistency
Types: BEFORE INSERT, AFTER INSERT, BEFORE UPDATE, AFTER UPDATE, BEFORE DELETE, AFTER DELETE
Index data structure that improves the speed of data retrieval operations on a database table
Allows the database to find specific rows quickly without scanning the entire table
Types: clustered index (determines the physical order of data), non-clustered index (separate structure from the data)
B-tree index balanced tree data structure used for efficient searching and sorting
Maintains sorted order of keys, allowing for fast lookup, insertion, and deletion
Commonly used for primary keys and unique indexes
Hash index data structure that uses a hash function to compute the index of a data record
Provides fast access to data based on key values
Suitable for equality comparisons but not for range queries
Query optimization process of choosing the most efficient execution plan for a given SQL query
Query optimizer analyzes the query, considers various execution plans, and selects the plan with the lowest estimated cost
Techniques: index selection, join order optimization, query rewriting
Explain plan tool that provides information about how the database will execute a given SQL query
Shows the execution steps, join order, and access methods used
Helps identify performance bottlenecks and optimize queries
Partitioning technique of dividing large tables into smaller, more manageable parts
Improves query performance by reducing the amount of data scanned
Types: range partitioning, hash partitioning, list partitioning
Transaction Management and Concurrency
Transaction logical unit of work that consists of one or more database operations
ACID properties ensure reliable processing of transactions
Atomicity ensures that all operations in a transaction are treated as a single unit (either all are completed or none are)
Consistency ensures that the database remains in a valid state before and after a transaction
Isolation ensures that concurrent transactions do not interfere with each other
Durability ensures that the changes made by a committed transaction are permanent
Concurrency control mechanism that manages simultaneous access to the database by multiple transactions
Ensures data consistency and prevents conflicts between concurrent transactions
Techniques: locking, timestamping, optimistic concurrency control
Locking mechanism that restricts access to a data item when a transaction is using it
Shared lock (read lock) allows multiple transactions to read the same data item simultaneously
Exclusive lock (write lock) allows only one transaction to modify a data item at a time
Deadlock situation where two or more transactions are waiting for each other to release locks, resulting in a circular dependency
Detection techniques: timeout, wait-for graph
Prevention techniques: two-phase locking, resource ordering
Isolation levels define the degree to which a transaction is isolated from other concurrent transactions
Read uncommitted allows dirty reads (reading uncommitted changes)
Read committed prevents dirty reads but allows non-repeatable reads and phantom reads
Repeatable read prevents dirty reads and non-repeatable reads but allows phantom reads
Serializable highest isolation level, prevents all concurrency issues (dirty reads, non-repeatable reads, phantom reads)
Logging mechanism that records the changes made by transactions to ensure durability and support recovery
Redo log contains information necessary to redo changes made by committed transactions
Undo log contains information necessary to undo changes made by aborted transactions