Intro to Database Systems

💾Intro to Database Systems Unit 6 – SQL Data Manipulation Language (DML)

SQL Data Manipulation Language (DML) is the backbone of database interactions. It allows users to query, insert, update, and delete data in relational databases. These commands are essential for managing and analyzing data stored in tables. Mastering DML is crucial for anyone working with databases. From retrieving specific information with SELECT to modifying existing data with UPDATE, DML commands provide the tools needed to effectively manipulate and maintain data in relational database systems.

What's SQL DML?

  • SQL Data Manipulation Language (DML) consists of commands used to manipulate and retrieve data stored in a relational database
  • DML focuses on querying, inserting, updating, and deleting data within database tables
  • Enables users to interact with the data and perform essential operations for data management and analysis
  • DML commands are executed on existing tables and do not modify the database schema or structure
  • Used in conjunction with Data Definition Language (DDL) commands that define and modify the database structure (tables, indexes, constraints)
  • DML statements are typically used by application developers, data analysts, and database administrators to work with data
  • Proficiency in DML is crucial for effective data retrieval, manipulation, and maintenance in relational databases

Key DML Commands

  • SELECT retrieves data from one or more tables based on specified criteria
    • Allows filtering, sorting, and joining data from multiple tables
    • Can perform aggregate functions (COUNT, SUM, AVG) and grouping operations
  • INSERT adds new rows of data into a table
    • Specifies the table name and the values to be inserted for each column
    • Can insert a single row or multiple rows in a single statement
  • UPDATE modifies existing data in a table based on specified conditions
    • Changes the values of one or more columns in selected rows
    • Requires a WHERE clause to identify the rows to be updated
  • DELETE removes one or more rows from a table based on specified conditions
    • Permanently deletes data from the table
    • Requires a WHERE clause to identify the rows to be deleted
  • MERGE combines the functionality of INSERT, UPDATE, and DELETE into a single statement
    • Performs insert, update, or delete operations based on the existence of matching rows in the target table
  • TRUNCATE quickly removes all rows from a table, resetting it to an empty state
    • Faster than using DELETE without a WHERE clause
    • Cannot be rolled back and does not fire triggers

SELECT Statement Basics

  • SELECT is the most commonly used DML command for retrieving data from a database
  • Basic syntax:
    SELECT column1, column2, ... FROM table_name;
  • Retrieves data from specified columns of a table and returns a result set
  • Can retrieve all columns using the asterisk wildcard:
    SELECT * FROM table_name;
  • Supports arithmetic expressions and string concatenation in the SELECT list
  • Allows renaming columns using aliases with the AS keyword:
    SELECT column1 AS alias1, column2 AS alias2 FROM table_name;
  • Can apply functions to retrieved data (UPPER, LOWER, ROUND, DATE_FORMAT)
  • Enables filtering data using the WHERE clause to specify conditions:
    SELECT column1, column2 FROM table_name WHERE condition;

Filtering and Sorting Data

  • WHERE clause filters rows based on specified conditions
    • Conditions can include comparison operators (=, <>, <, >, <=, >=), logical operators (AND, OR, NOT), and pattern matching (LIKE, IN)
    • Example:
      SELECT * FROM employees WHERE salary > 50000 AND department = 'Sales';
  • ORDER BY clause sorts the result set based on one or more columns
    • Specifies the column(s) to sort by and the sort order (ASC for ascending, DESC for descending)
    • Example:
      SELECT * FROM products ORDER BY price DESC, name ASC;
  • LIMIT clause restricts the number of rows returned by the query
    • Useful for pagination or retrieving a subset of results
    • Example:
      SELECT * FROM customers LIMIT 10;
  • DISTINCT keyword removes duplicate rows from the result set
    • Considers all columns specified in the SELECT list for uniqueness
    • Example:
      SELECT DISTINCT city FROM addresses;
  • GROUP BY clause groups rows based on specified columns and performs aggregate functions
    • Often used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to calculate summary values
    • Example:
      SELECT department, COUNT(*) AS employee_count FROM employees GROUP BY department;
  • HAVING clause filters groups based on aggregate function results
    • Similar to WHERE clause but applied after grouping and aggregation
    • Example:
      SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 60000;

Joining Tables

  • Joins combine rows from two or more tables based on a related column between them
  • Enable retrieving data from multiple tables in a single query
  • Types of joins:
    • INNER JOIN returns only the matching rows between the tables
      • Example:
        SELECT * FROM orders INNER JOIN customers ON orders.customer_id = customers.customer_id;
    • LEFT JOIN (or LEFT OUTER JOIN) returns all rows from the left table and the matching rows from the right table, with NULL values for non-matching rows
      • Example:
        SELECT * FROM customers LEFT JOIN orders ON customers.customer_id = orders.customer_id;
    • RIGHT JOIN (or RIGHT OUTER JOIN) returns all rows from the right table and the matching rows from the left table, with NULL values for non-matching rows
      • Example:
        SELECT * FROM orders RIGHT JOIN customers ON orders.customer_id = customers.customer_id;
    • FULL JOIN (or FULL OUTER JOIN) returns all rows from both tables, with NULL values for non-matching rows
      • Example:
        SELECT * FROM customers FULL JOIN orders ON customers.customer_id = orders.customer_id;
  • ON clause specifies the join condition between the tables
  • Joins can be chained to combine multiple tables in a single query
  • Self-joins allow joining a table with itself to compare rows within the same table

Modifying Data with INSERT, UPDATE, and DELETE

  • INSERT adds new rows to a table
    • Specifies the table name and the values to be inserted for each column
    • Example:
      INSERT INTO employees (first_name, last_name, email) VALUES ('John', 'Doe', 'john.doe@example.com');
    • Can insert multiple rows using a single INSERT statement by providing multiple sets of values
    • Can insert data from a SELECT statement using
      INSERT INTO table_name (column1, column2, ...) SELECT ...;
  • UPDATE modifies existing data in a table based on specified conditions
    • Changes the values of one or more columns in selected rows
    • Requires a WHERE clause to identify the rows to be updated
    • Example:
      UPDATE products SET price = price * 1.1 WHERE category = 'Electronics';
    • Without a WHERE clause, UPDATE will modify all rows in the table
  • DELETE removes one or more rows from a table based on specified conditions
    • Permanently deletes data from the table
    • Requires a WHERE clause to identify the rows to be deleted
    • Example:
      DELETE FROM customers WHERE last_purchase_date < '2020-01-01';
    • Without a WHERE clause, DELETE will remove all rows from the table
  • It is important to use WHERE clauses carefully with UPDATE and DELETE to avoid unintended data modifications
  • Transactions (BEGIN, COMMIT, ROLLBACK) ensure data integrity and allow rolling back changes if needed

Advanced DML Techniques

  • Subqueries are queries nested within another query
    • Can be used in SELECT, FROM, WHERE, and HAVING clauses
    • Allow using the result of one query as input to another query
    • Example:
      SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
  • Correlated subqueries are subqueries that reference columns from the outer query
    • Executed for each row in the outer query
    • Example:
      SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE department = e.department);
  • EXISTS and NOT EXISTS operators check for the existence of rows satisfying a subquery condition
    • Example:
      SELECT * FROM customers WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.customer_id);
  • CASE expressions allow conditional processing within a query
    • Provide if-then-else logic to perform different actions based on conditions
    • Example:
      SELECT product_name, price, CASE WHEN price > 100 THEN 'Expensive' ELSE 'Affordable' END AS price_category FROM products;
  • Window functions perform calculations across a set of rows related to the current row
    • Include ranking functions (ROW_NUMBER, RANK, DENSE_RANK), aggregate functions (SUM, AVG), and analytic functions (LEAD, LAG)
    • Example:
      SELECT employee_id, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank FROM employees;
  • Common Table Expressions (CTEs) are named temporary result sets within a query
    • Defined using the WITH clause and can be referenced multiple times in the main query
    • Improve query readability and maintainability
    • Example:
      WITH high_salary_employees AS (SELECT * FROM employees WHERE salary > 100000) SELECT * FROM high_salary_employees WHERE department = 'IT';

Common Pitfalls and Best Practices

  • Always use explicit column names in INSERT statements instead of relying on column order
  • Be cautious when using UPDATE and DELETE without a WHERE clause to avoid unintended data modifications
  • Use meaningful and descriptive names for tables, columns, and aliases to enhance code readability
  • Avoid using SELECT * in production queries and specify only the required columns
  • Use parameterized queries or prepared statements to prevent SQL injection vulnerabilities
  • Optimize queries by adding appropriate indexes on frequently used columns in WHERE, JOIN, and ORDER BY clauses
  • Avoid using functions or calculations in WHERE clauses as they may prevent the use of indexes
  • Break down complex queries into smaller, manageable parts using subqueries or CTEs for better readability and maintenance
  • Use LIMIT or TOP clauses to retrieve only the necessary number of rows, especially when dealing with large datasets
  • Regularly monitor and analyze query performance using tools like EXPLAIN or execution plans to identify and optimize slow queries
  • Follow consistent naming conventions for tables, columns, and other database objects
  • Implement proper error handling and logging mechanisms in application code that interacts with the database
  • Regularly backup your database to prevent data loss in case of failures or accidents


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.