What is AP Cybersecurity unit 5?
Data is the target. Every attack covered in this unit, whether it steals, modifies, or destroys information, ultimately aims at the confidentiality, integrity, or availability of data and the applications that process it. Unit 5 builds a complete defensive picture: understand the attack, classify the data, apply the right control, encrypt what matters, and detect what gets through.
Unit 5 teaches how adversaries exploit application and file vulnerabilities, how defenders classify data and apply access controls and cryptography to protect it, and how detective controls like log analysis and honeypots identify attacks that succeed.
Attacks on applications and data
SQL injection, cross-site scripting (XSS), buffer overflow, and directory traversal all exploit applications that fail to validate or sanitize user input. Unencrypted files and weak access control settings give adversaries easy access to sensitive data without needing a sophisticated exploit.
Cryptography for protection
Symmetric encryption (AES) uses one shared key and is fast for bulk data. Asymmetric encryption (RSA, ECC) uses a public/private key pair so parties can communicate securely without sharing a secret in advance. Key length determines keyspace size: an n-bit key has a keyspace of 2^n, and longer keys are harder to brute-force.
Detection when prevention fails
Log analysis (accounting), honeypots, cryptographic hash verification, and data loss prevention (DLP) services form the detection layer. Each tool has trade-offs in cost, speed, and the types of attacks it can catch, including important blind spots like hash functions that cannot detect read-only data theft.
Defense in depth for dataNo single control is enough. Unit 5 shows how layering access controls, encryption, secure coding practices, and detective tools creates overlapping protection. When one layer fails, another layer catches the attack or limits its impact. Understanding why each layer exists, and what it cannot do, is the core skill this unit develops.
Unit 5 review notes
5.1
Application and Data Vulnerabilities and Attacks
Adversaries exploit three main weaknesses: unencrypted files readable by anyone with device access, overly permissive account privileges that give elevated access when a user account is compromised, and applications that fail to validate user input. The four major injection-style attacks all stem from that last failure.
- SQL injection: Adversary inserts SQL control words (WHERE, OR 1=1, --) into an input field to manipulate a database query and read, modify, or delete records.
- Cross-site scripting (XSS): Adversary injects a <script> tag into a web application so malicious code runs in another user's browser, stealing session data or redirecting the user.
- Buffer overflow: Adversary sends more data than an input field can hold, overwriting adjacent memory and potentially executing arbitrary code.
- Directory traversal: Adversary uses ../ sequences in a file path to navigate outside the intended directory and access restricted files on the server.
- Risk assessment (CIA): Data vulnerability risk is rated by combining the sensitivity of the data with the likelihood of exploit, measured against confidentiality, integrity, and availability impacts.
Can you explain why an application that does not validate user input is vulnerable to all four attack types listed above?
| Attack | What it exploits | Primary CIA impact |
|---|
| SQL injection | Unvalidated input passed to a database | Confidentiality / Integrity |
| XSS | Unvalidated input rendered in a browser | Confidentiality |
| Buffer overflow | Input field with no size limit | Integrity / Availability |
| Directory traversal | Unvalidated file path input | Confidentiality |
5.2
Data States, Managerial Controls, and Access Control Models
Protecting data starts with knowing what state it is in and what regulations apply. Organizations then layer managerial policies and access control models on top to limit who can do what to which files.
- Data at rest / in transit / in use: At rest: stored on a drive, protected by encryption and physical security. In transit: moving over a network, protected by encryption and secure channels. In use: being processed, protected by access controls.
- PII / PHI / PCI: Regulated data categories. PII is personally identifiable information; PHI is protected health information (HIPAA); PCI is payment card information (PCI-DSS). Each carries legal requirements for how data must be protected.
- Role-based access control (RBAC): Assigns subjects to roles (e.g., accountant) and grants roles access to objects (e.g., payroll software). Simple to manage at scale.
- Rule-based access control (RuBAC): Checks a set of rules to allow or deny access dynamically, such as time-of-day restrictions.
- Linux file permissions (chmod): Three permission types (read r, write w, execute x) set for three entities (owner, group, others). Example: chmod 750 grants rwx to owner, r-x to group, and no access to others.
Given a Linux permission string like rw-r--r--, identify what each segment means and which entity it applies to.
| Access Control Model | How access is determined | Best used when |
|---|
| RBAC | Subject's assigned role | Large organizations with defined job functions |
| RuBAC | Evaluation of a rule set | Time-based or context-based restrictions are needed |
| MAC (Mandatory) | Data classification labels matched to subject clearance | High-security environments like government systems |
| DAC (Discretionary) | File owner sets permissions | Small teams where owners manage their own files |
5.3
Symmetric Encryption and Protecting Stored Data
Cryptography hides information by transforming plaintext into ciphertext using an algorithm and a key. Symmetric encryption uses the same key to encrypt and decrypt, making it fast but requiring a secure way to share that key in advance.
- Plaintext / ciphertext: Plaintext is the original readable data. Ciphertext is the scrambled output after encryption. Decryption reverses the process using the correct key.
- Keyspace: The total number of possible keys for an algorithm. An n-bit key has a keyspace of 2^n. Larger keyspaces make brute-force attacks slower.
- AES (Advanced Encryption Standard): The most common symmetric algorithm. Encrypts data in 128-bit blocks with key lengths of 128, 192, or 256 bits. Used for Wi-Fi, HTTPS, and file encryption.
- OpenSSL (symmetric use): Command-line tool for encrypting and decrypting files with AES. Syntax follows the pattern: openssl enc -aes-256-cbc -in file -out file.enc.
- Block vs. stream cipher: Block ciphers like AES encrypt fixed-size chunks of data. Stream ciphers encrypt data one bit or byte at a time. AES is a block cipher.
Why does a longer AES key length increase security, and what is the trade-off?
5.4
Asymmetric Cryptography and Key Length
Asymmetric encryption solves the key-sharing problem by using a mathematically linked key pair. The public key encrypts; only the matching private key decrypts. This makes secure communication possible between parties who have never met.
- Public key / private key: Generated together as mathematical inverses. The public key is shared openly. The private key is kept secret. If the private key is compromised, the entire key pair must be discarded and regenerated.
- RSA: Common asymmetric algorithm. Key pairs are generated with openssl genrsa. Typical key lengths are 2048 or 4096 bits. Used for digital signatures and certificates.
- ECC (Elliptic Curve Cryptography): Asymmetric algorithm that achieves equivalent security to RSA with shorter key lengths, making it faster and more efficient on constrained devices.
- Keyspace and brute force: An n-bit key has a keyspace of 2^n. On average, a brute-force attack finds the key in 2^(n-1) guesses. Key-length comparisons are only valid within the same algorithm.
- Digital signatures: Asymmetric keys are used to sign data: the sender encrypts a hash of the message with their private key. The receiver decrypts it with the sender's public key to verify authenticity and integrity.
A sender wants to encrypt a message so only the receiver can read it. Which key does the sender use, and why?
| Property | Symmetric (AES) | Asymmetric (RSA / ECC) |
|---|
| Number of keys | One shared key | Key pair: public and private |
| Key sharing required? | Yes, must share secret key securely | No, public key can be distributed openly |
| Speed | Faster for large data | Slower; typically used for small data or key exchange |
| Common use | File encryption, Wi-Fi, HTTPS bulk data | Digital signatures, certificates, key exchange |
| Example algorithm | AES-256 | RSA-2048, ECC |
5.5
Protecting Applications: Secure by Design and Input Sanitization
Application security begins before a single line of code is written. Secure by design embeds security into every phase of product development. Input sanitization is the technical mechanism that stops injection attacks at the point of entry.
- Secure by design: Security is a design principle built into all phases of development, not a feature added after launch. Includes three principles: take ownership of customer security outcomes, embrace radical transparency, and build organizational leadership around security.
- Secure by default: Products ship with security protections already enabled. Users should not have to opt in to basic security settings.
- Input sanitization: A function that removes or rejects control characters (single quote, double quote, semicolon) from user input before the application processes it. Blocks SQL injection, XSS, and directory traversal.
- Data validation: Verifying that user input matches expected criteria (e.g., a number field only accepts digits) before processing. Applications that skip validation are vulnerable to injection attacks.
- Control characters: Characters like ' " ; that applications use to structure commands. Adversaries embed these in input to break out of expected processing and inject malicious instructions.
How does input sanitization specifically prevent a SQL injection attack?
5.6
Detecting Attacks on Data and Applications
Detective controls identify attacks that preventive controls missed. The main tools are log analysis, honeypots, cryptographic hash verification, and DLP services. Each has different costs, detection speeds, and blind spots that defenders must understand.
- Accounting (log analysis): Recording and monitoring user activity. Logs reveal suspicious patterns like accessing unusual files, activity outside normal hours, or attempts to copy or delete sensitive data.
- Honeypot: A fake file containing realistic-looking but false data (credit card numbers, passwords). Any access attempt triggers an alert because there is no legitimate reason to open it.
- Cryptographic hash function: Produces a fixed-length output (hash) for any input. If a file's hash changes between two measurements, the file was altered. SHA-256 is the standard; MD5 and SHA-1 are considered weak.
- DLP (Data Loss Prevention): Third-party services that monitor data access, usage, and transmission across an organization. High detection capability at higher cost than honeypots or hashing.
- Detection blind spots: Hash functions cannot detect read-only data theft because the file is not altered. Honeypots cannot detect adversaries who never access them. False negatives are a real risk in every detection method.
An adversary reads a sensitive file but does not modify it. Which detection tools would catch this, and which would not?
| Detective Control | Cost | Detection timing | Key limitation |
|---|
| Log analysis (manual) | Low | Retrospective | Requires human review; slow without automation |
| Automated log analysis | Medium | Real-time | Requires configuration; can produce false positives |
| Honeypot | Low | Near-instant | Only detects adversaries who access the honeypot |
| Cryptographic hash | Low | Retrospective | Cannot detect read-only theft; only detects modification |
| DLP service | High | Real-time | Cost may be prohibitive for smaller organizations |
Practice AP Cybersecurity unit 5 questions
Try AP-style multiple-choice questions and written prompts after you review the notes.
QuestionA hospital's IT team discovers that a billing clerk accessed and exported patient diagnosis records, which the clerk's job function does not require. The team wants to implement an access control model that restricts each employee to only the data their specific job requires. Which model best addresses this gap, and why?
Role-based access control (RBAC), because it ties permissions to job roles so billing clerks only access billing objects and not clinical diagnosis records
Discretionary access control (DAC), because it ties permissions to job roles so billing clerks only access billing objects and not clinical diagnosis records
Rule-based access control (RuBAC), because it ties permissions to job roles so billing clerks only access billing objects and not clinical diagnosis records
Mandatory access control (MAC), because it ties permissions to job roles so billing clerks only access billing objects and not clinical diagnosis records
QuestionA healthcare company stores patient medical records in a database encrypted with a 40-bit key, and access to the database is restricted only to employees in the billing department. A security auditor flags this configuration. Which risk level and rationale best describe this vulnerability?
Moderate risk, because regulated sensitive data has some encryption but the key length is too short to resist modern attacks
High risk, because regulated sensitive data is stored with no encryption and no access controls whatsoever on the system
Low risk, because access controls limit database access to authorized billing staff, reducing the likelihood of a breach
Moderate risk, because the billing department access controls are too broadly defined and allow unnecessary data exposure