💾Intro to Database Systems Unit 6 – SQL Data Manipulation Language (DML)
SQL Data Manipulation Language (DML) is the backbone of database interactions. It allows users to query, insert, update, and delete data in relational databases. These commands are essential for managing and analyzing data stored in tables.
Mastering DML is crucial for anyone working with databases. From retrieving specific information with SELECT to modifying existing data with UPDATE, DML commands provide the tools needed to effectively manipulate and maintain data in relational database systems.
SQL Data Manipulation Language (DML) consists of commands used to manipulate and retrieve data stored in a relational database
DML focuses on querying, inserting, updating, and deleting data within database tables
Enables users to interact with the data and perform essential operations for data management and analysis
DML commands are executed on existing tables and do not modify the database schema or structure
Used in conjunction with Data Definition Language (DDL) commands that define and modify the database structure (tables, indexes, constraints)
DML statements are typically used by application developers, data analysts, and database administrators to work with data
Proficiency in DML is crucial for effective data retrieval, manipulation, and maintenance in relational databases
Key DML Commands
SELECT retrieves data from one or more tables based on specified criteria
Allows filtering, sorting, and joining data from multiple tables
Can perform aggregate functions (COUNT, SUM, AVG) and grouping operations
INSERT adds new rows of data into a table
Specifies the table name and the values to be inserted for each column
Can insert a single row or multiple rows in a single statement
UPDATE modifies existing data in a table based on specified conditions
Changes the values of one or more columns in selected rows
Requires a WHERE clause to identify the rows to be updated
DELETE removes one or more rows from a table based on specified conditions
Permanently deletes data from the table
Requires a WHERE clause to identify the rows to be deleted
MERGE combines the functionality of INSERT, UPDATE, and DELETE into a single statement
Performs insert, update, or delete operations based on the existence of matching rows in the target table
TRUNCATE quickly removes all rows from a table, resetting it to an empty state
Faster than using DELETE without a WHERE clause
Cannot be rolled back and does not fire triggers
SELECT Statement Basics
SELECT is the most commonly used DML command for retrieving data from a database
Basic syntax:
SELECT column1, column2, ... FROM table_name;
Retrieves data from specified columns of a table and returns a result set
Can retrieve all columns using the asterisk wildcard:
SELECT * FROM table_name;
Supports arithmetic expressions and string concatenation in the SELECT list
Allows renaming columns using aliases with the AS keyword:
SELECT column1 AS alias1, column2 AS alias2 FROM table_name;
Can apply functions to retrieved data (UPPER, LOWER, ROUND, DATE_FORMAT)
Enables filtering data using the WHERE clause to specify conditions:
SELECT column1, column2 FROM table_name WHERE condition;
Filtering and Sorting Data
WHERE clause filters rows based on specified conditions
Conditions can include comparison operators (=, <>, <, >, <=, >=), logical operators (AND, OR, NOT), and pattern matching (LIKE, IN)
Example:
SELECT * FROM employees WHERE salary > 50000 AND department = 'Sales';
ORDER BY clause sorts the result set based on one or more columns
Specifies the column(s) to sort by and the sort order (ASC for ascending, DESC for descending)
Example:
SELECT * FROM products ORDER BY price DESC, name ASC;
LIMIT clause restricts the number of rows returned by the query
Useful for pagination or retrieving a subset of results
Example:
SELECT * FROM customers LIMIT 10;
DISTINCT keyword removes duplicate rows from the result set
Considers all columns specified in the SELECT list for uniqueness
Example:
SELECT DISTINCT city FROM addresses;
GROUP BY clause groups rows based on specified columns and performs aggregate functions
Often used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to calculate summary values
Example:
SELECT department, COUNT(*) AS employee_count FROM employees GROUP BY department;
HAVING clause filters groups based on aggregate function results
Similar to WHERE clause but applied after grouping and aggregation
Example:
SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 60000;
Joining Tables
Joins combine rows from two or more tables based on a related column between them
Enable retrieving data from multiple tables in a single query
Types of joins:
INNER JOIN returns only the matching rows between the tables
Example:
SELECT * FROM orders INNER JOIN customers ON orders.customer_id = customers.customer_id;
LEFT JOIN (or LEFT OUTER JOIN) returns all rows from the left table and the matching rows from the right table, with NULL values for non-matching rows
Example:
SELECT * FROM customers LEFT JOIN orders ON customers.customer_id = orders.customer_id;
RIGHT JOIN (or RIGHT OUTER JOIN) returns all rows from the right table and the matching rows from the left table, with NULL values for non-matching rows
Example:
SELECT * FROM orders RIGHT JOIN customers ON orders.customer_id = customers.customer_id;
FULL JOIN (or FULL OUTER JOIN) returns all rows from both tables, with NULL values for non-matching rows
Example:
SELECT * FROM customers FULL JOIN orders ON customers.customer_id = orders.customer_id;
ON clause specifies the join condition between the tables
Joins can be chained to combine multiple tables in a single query
Self-joins allow joining a table with itself to compare rows within the same table
Modifying Data with INSERT, UPDATE, and DELETE
INSERT adds new rows to a table
Specifies the table name and the values to be inserted for each column
Example:
INSERT INTO employees (first_name, last_name, email) VALUES ('John', 'Doe', 'john.doe@example.com');
Can insert multiple rows using a single INSERT statement by providing multiple sets of values
Can insert data from a SELECT statement using
INSERT INTO table_name (column1, column2, ...) SELECT ...;
UPDATE modifies existing data in a table based on specified conditions
Changes the values of one or more columns in selected rows
Requires a WHERE clause to identify the rows to be updated
Example:
UPDATE products SET price = price * 1.1 WHERE category = 'Electronics';
Without a WHERE clause, UPDATE will modify all rows in the table
DELETE removes one or more rows from a table based on specified conditions
Permanently deletes data from the table
Requires a WHERE clause to identify the rows to be deleted
Example:
DELETE FROM customers WHERE last_purchase_date < '2020-01-01';
Without a WHERE clause, DELETE will remove all rows from the table
It is important to use WHERE clauses carefully with UPDATE and DELETE to avoid unintended data modifications
Transactions (BEGIN, COMMIT, ROLLBACK) ensure data integrity and allow rolling back changes if needed
Advanced DML Techniques
Subqueries are queries nested within another query
Can be used in SELECT, FROM, WHERE, and HAVING clauses
Allow using the result of one query as input to another query
Example:
SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
Correlated subqueries are subqueries that reference columns from the outer query
Executed for each row in the outer query
Example:
SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE department = e.department);
EXISTS and NOT EXISTS operators check for the existence of rows satisfying a subquery condition
Example:
SELECT * FROM customers WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.customer_id);
CASE expressions allow conditional processing within a query
Provide if-then-else logic to perform different actions based on conditions
Example:
SELECT product_name, price, CASE WHEN price > 100 THEN 'Expensive' ELSE 'Affordable' END AS price_category FROM products;
Window functions perform calculations across a set of rows related to the current row
Include ranking functions (ROW_NUMBER, RANK, DENSE_RANK), aggregate functions (SUM, AVG), and analytic functions (LEAD, LAG)
Example:
SELECT employee_id, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank FROM employees;
Common Table Expressions (CTEs) are named temporary result sets within a query
Defined using the WITH clause and can be referenced multiple times in the main query
Improve query readability and maintainability
Example:
WITH high_salary_employees AS (SELECT * FROM employees WHERE salary > 100000) SELECT * FROM high_salary_employees WHERE department = 'IT';
Common Pitfalls and Best Practices
Always use explicit column names in INSERT statements instead of relying on column order
Be cautious when using UPDATE and DELETE without a WHERE clause to avoid unintended data modifications
Use meaningful and descriptive names for tables, columns, and aliases to enhance code readability
Avoid using SELECT * in production queries and specify only the required columns
Use parameterized queries or prepared statements to prevent SQL injection vulnerabilities
Optimize queries by adding appropriate indexes on frequently used columns in WHERE, JOIN, and ORDER BY clauses
Avoid using functions or calculations in WHERE clauses as they may prevent the use of indexes
Break down complex queries into smaller, manageable parts using subqueries or CTEs for better readability and maintenance
Use LIMIT or TOP clauses to retrieve only the necessary number of rows, especially when dealing with large datasets
Regularly monitor and analyze query performance using tools like EXPLAIN or execution plans to identify and optimize slow queries
Follow consistent naming conventions for tables, columns, and other database objects
Implement proper error handling and logging mechanisms in application code that interacts with the database
Regularly backup your database to prevent data loss in case of failures or accidents