and are crucial steps in preparing data for AI applications. These processes involve cleaning, transforming, and enhancing raw data to make it suitable for machine learning algorithms.

From and cleaning to advanced feature engineering techniques, these steps ensure and create meaningful features. By addressing issues like missing values, outliers, and , we can significantly improve the and of AI models.

Data Preprocessing for AI

Data Collection and Cleaning

Top images from around the web for Data Collection and Cleaning
Top images from around the web for Data Collection and Cleaning
  • Data preprocessing prepares data for AI applications through several key stages ensuring data quality and suitability for model training
  • Data collection and integration gather data from various sources (databases, APIs, web scraping) and combine it into a unified dataset addressing and incompatibility
  • identifies and corrects or removes errors, inconsistencies, and inaccuracies in the dataset improving overall data quality
    • Remove duplicate records
    • Fix formatting issues (inconsistent date formats, capitalization)
    • Correct obvious errors (negative ages, impossible values)
  • Handling uses techniques such as , , or specialized algorithms to address gaps in the dataset
    • Mean/median imputation replaces missing values with average values
    • generates multiple plausible values for missing data
    • Deletion removes records with missing values (can lead to loss of information)

Data Transformation and Reduction

  • Data transformation techniques ensure data is in a suitable format for AI algorithms
    • scales numerical features to a common range (0-1)
    • transforms features to have zero mean and unit variance
    • converts categorical variables into numerical format (, )
  • and treatment identify and manage extreme values that may skew analysis or model performance
    • (, )
    • (, )
  • techniques manage large datasets and improve model efficiency
    • chooses most relevant features (correlation-based, mutual information)
    • reduces number of features while preserving information (, )

Feature Engineering Techniques

Numerical and Categorical Feature Engineering

  • Feature engineering creates new features or modifies existing ones to improve performance and interpretability of AI models
  • plays crucial role in identifying relevant attributes and creating meaningful derived features
  • techniques capture or normalize data distributions
    • groups continuous values into discrete categories (age groups, income brackets)
    • adjusts features to a specific range or distribution (, )
    • create new features (square root, exponential, trigonometric functions)
  • converts categorical variables into format suitable for machine learning algorithms
    • One-hot encoding creates binary columns for each category
    • Label encoding assigns numerical values to categories
    • converts into fixed-size vector

Advanced Feature Engineering

  • converts unstructured text data into numerical features
    • represents text as frequency of words
    • (Term Frequency-Inverse Document Frequency) weighs importance of words in a document
    • capture semantic relationships between words (, )
  • captures temporal patterns in data
    • use past values as predictors
    • compute moving averages or other metrics over time windows
    • extract cyclical patterns (daily, weekly, yearly trends)
  • and capture complex relationships between existing features
    • Multiplication of two features creates interaction term
    • Polynomial features generate higher-order terms (squares, cubes) to model non-linear relationships

Data Quality Impact on AI Models

Data Quality Issues and Their Effects

  • Data quality directly affects accuracy, reliability, and generalizability of AI models with poor quality data leading to biased or inaccurate predictions
  • Common significantly impact model performance if not properly addressed
    • Missing values create incomplete information
    • Outliers skew statistical measures and model training
    • Inconsistencies in data representation lead to confusion in model learning
    • Noise obscures true patterns in the data
  • Class imbalance in datasets leads to biased models necessitating techniques to improve model fairness
    • increases minority class samples ()
    • reduces majority class samples
    • creates artificial samples to balance classes

Monitoring and Maintaining Data Quality

  • and occur when statistical properties of target variable or relationship between features and target change over time affecting model performance
    • Data drift: changes in distribution of input features
    • Concept drift: changes in relationship between features and target variable
  • "Garbage in, garbage out" principle emphasizes sophisticated AI models cannot compensate for poor quality input data
  • Regular data quality assessments and monitoring maintain effectiveness of AI models in production environments
    • Implement data validation checks
    • Monitor data distributions over time
    • Set up alerts for significant changes in data characteristics
  • Techniques evaluate impact of data quality on model performance and generalization capabilities
    • Cross-validation assesses model performance on different subsets of data
    • Holdout validation tests model on completely unseen data
    • A/B testing compares model performance with different data quality improvements

Key Terms to Review (64)

Accuracy: Accuracy refers to the degree to which a result or measurement conforms to the correct value or standard. In AI and machine learning, accuracy is crucial as it indicates how well an algorithm or model performs in making predictions or classifications, reflecting the effectiveness of various algorithms and techniques in real-world applications.
Bag-of-words: The bag-of-words model is a popular method in natural language processing that represents text data by treating each document as a collection of words, disregarding grammar and word order. This approach simplifies text representation, making it easier to analyze and compare documents based on word frequency and presence, which plays a vital role in data preprocessing and feature engineering.
Binning: Binning is a data preprocessing technique that involves grouping a set of numerical values into discrete categories or 'bins'. This technique helps to reduce the effects of minor observation errors and can simplify models by transforming continuous data into categorical data, making it easier for algorithms to analyze and interpret. Binning is particularly useful in feature engineering as it enhances the effectiveness of predictive modeling by converting numeric attributes into categorical ones, allowing for better handling of the data.
Categorical feature engineering: Categorical feature engineering is the process of transforming categorical variables into a format that can be effectively used by machine learning algorithms. This transformation often involves encoding methods, such as one-hot encoding or label encoding, to convert non-numeric categories into numerical representations. By handling categorical data properly, models can better learn from these features and improve their predictive performance.
Class imbalance: Class imbalance refers to a situation in a dataset where the number of observations in different classes is not approximately equal, leading to skewed distributions. This can cause machine learning models to favor the majority class, making them less effective at predicting the minority class. Properly addressing class imbalance is crucial during data preprocessing and feature engineering to ensure balanced model performance.
Concept Drift: Concept drift refers to the change in the underlying relationships in data over time, affecting the performance of predictive models. This phenomenon occurs when the statistical properties of the target variable change, leading to a decline in model accuracy if it is not updated or retrained. Understanding concept drift is crucial for maintaining the relevance and accuracy of machine learning models, especially in dynamic environments where data patterns can evolve.
Data cleaning: Data cleaning is the process of identifying and correcting errors and inconsistencies in data to improve its quality and usability for analysis. This essential step ensures that data is accurate, complete, and formatted correctly, which significantly impacts the effectiveness of data preprocessing and feature engineering. By refining datasets, it enhances the model's performance and reliability, leading to better decision-making in business applications.
Data collection: Data collection is the systematic process of gathering and measuring information from various sources to gain insights or answer specific questions. This process is critical in data preprocessing and feature engineering, as it lays the foundation for effective analysis and model development. By collecting high-quality data, businesses can ensure that subsequent steps in the data analysis pipeline are based on reliable and relevant information, which ultimately enhances decision-making and predictive accuracy.
Data drift: Data drift refers to the change in the statistical properties of a dataset over time, which can affect the performance of machine learning models. It can occur due to shifts in the underlying data distribution, changes in external factors, or evolving user behavior. Understanding data drift is crucial for maintaining model accuracy and effectiveness as it can lead to outdated predictions if not addressed promptly.
Data heterogeneity: Data heterogeneity refers to the diverse types and formats of data that can exist within a dataset, including structured, semi-structured, and unstructured data. This variety poses challenges during the preprocessing and feature engineering stages, as it requires tailored methods to clean, transform, and extract useful features from the data, ensuring compatibility for effective analysis and modeling.
Data incompatibility: Data incompatibility refers to the challenges that arise when integrating or comparing datasets that are not compatible due to differences in structure, format, or semantics. This issue often hampers data preprocessing and feature engineering, as it can lead to difficulties in data analysis and model performance. When datasets vary in representation, such as using different units of measurement or having mismatched categories, it can complicate the process of extracting meaningful insights.
Data Integration: Data integration is the process of combining data from different sources to provide a unified view, ensuring that all relevant data is accessible for analysis and decision-making. This process is essential in various contexts, as it enables organizations to derive meaningful insights from disparate datasets, leading to improved decision-making and efficiency. Effective data integration involves cleaning, transforming, and consolidating data, ensuring that it is ready for analysis and can drive business strategies.
Data preprocessing: Data preprocessing is the process of cleaning, transforming, and organizing raw data into a suitable format for analysis and modeling. This step is crucial as it directly impacts the quality and performance of machine learning algorithms, ensuring that the data used is accurate and relevant for drawing insights. Effective data preprocessing can significantly enhance the performance of machine learning models in various applications, helping organizations make better decisions based on data-driven insights.
Data Quality: Data quality refers to the overall utility of a dataset as a function of its accuracy, completeness, reliability, and relevance for a specific purpose. High data quality is essential in various processes such as analysis, decision-making, and forecasting, as it directly impacts the effectiveness and success of artificial intelligence applications in business. Ensuring high data quality involves rigorous data validation, cleansing, and management practices, which are crucial at every stage from data collection to preprocessing and analysis.
Data quality issues: Data quality issues refer to problems that affect the accuracy, completeness, consistency, and reliability of data within a dataset. These issues can arise from various sources such as data entry errors, outdated information, or discrepancies between different data systems. Addressing these issues is crucial for effective data preprocessing and feature engineering, successful AI project management, and reliable sales forecasting and optimization.
Data reduction: Data reduction is the process of reducing the volume of data while preserving its essential characteristics and information content. This technique is crucial in managing large datasets, making them easier to analyze and interpret without losing significant insights. By applying various methods like dimensionality reduction or data compression, data reduction helps streamline data preprocessing and feature engineering, enhancing model performance and speeding up computations.
Deletion: Deletion is the process of removing certain data points or variables from a dataset to enhance its quality and usability. This is crucial in preparing data for analysis, as it helps to eliminate noise, reduce dimensionality, and prevent potential biases that can arise from incomplete or irrelevant information.
Dimensionality Reduction: Dimensionality reduction is a process used in machine learning and statistics to reduce the number of input variables in a dataset while preserving essential information. This technique helps simplify models, enhance visualization, and improve computational efficiency. By transforming high-dimensional data into a lower-dimensional space, it makes it easier to analyze and interpret the data, which is crucial for developing effective algorithms and techniques.
Domain Knowledge: Domain knowledge refers to the specialized understanding and expertise in a particular area or field that is crucial for effectively addressing problems and making decisions within that context. This type of knowledge enables practitioners to identify relevant data, apply appropriate methodologies, and interpret results accurately, which is especially important when working with data preprocessing and feature engineering in machine learning projects.
Encoding: Encoding is the process of converting data into a format that can be efficiently processed, analyzed, and utilized by machine learning algorithms. This transformation helps in handling various data types and ensures that the input features are in a suitable numerical format for models to understand. Encoding can also involve the organization of categorical data into numerical representations, which is essential for building robust predictive models.
Feature Engineering: Feature engineering is the process of using domain knowledge to select, modify, or create new variables (features) that can improve the performance of machine learning models. This technique is essential as it directly impacts how well algorithms learn from data, which is crucial for tasks such as prediction and classification.
Feature Hashing: Feature hashing is a technique used to convert large sets of features into a fixed-size vector by applying a hash function. This method is particularly useful in situations where datasets contain a vast number of categorical variables, allowing for efficient storage and processing while maintaining the ability to capture essential patterns in the data. By using feature hashing, data scientists can simplify their models and reduce the dimensionality of their datasets without losing significant information.
Feature Interaction: Feature interaction refers to the phenomenon where the combined effect of two or more features in a dataset influences the output of a predictive model in a way that is not simply additive. This interaction can lead to complex relationships that are crucial for understanding how different features contribute to the final predictions. Recognizing and capturing feature interactions during data preprocessing and feature engineering can significantly enhance the model's performance by ensuring that important nonlinear relationships are taken into account.
Feature Selection: Feature selection is the process of identifying and selecting a subset of relevant features or variables that contribute most to the predictive power of a machine learning model. This process is crucial as it can enhance model performance, reduce overfitting, and improve interpretability by removing irrelevant or redundant data. Effective feature selection allows algorithms to focus on the most informative aspects of the data, ultimately leading to more accurate predictions.
Glove: In the context of data preprocessing and feature engineering, a glove is a method for generating word embeddings that captures the semantic meaning of words based on their context in a corpus. This technique utilizes global statistical information to learn relationships between words, allowing for a deeper understanding of language by converting words into high-dimensional vectors that can be used in machine learning algorithms.
High-cardinality categorical variables: High-cardinality categorical variables are those that contain a large number of unique categories or levels. These variables can pose challenges during data preprocessing and feature engineering, as traditional encoding methods may not be effective in representing the abundance of unique values, leading to increased dimensionality and potential overfitting in machine learning models.
Imputation: Imputation is the process of replacing missing data with substituted values to maintain the integrity of a dataset. This technique is crucial in data preprocessing because missing values can lead to biased analyses and inaccurate predictions, ultimately affecting the performance of machine learning models. Various imputation methods, such as mean, median, or mode substitution, as well as more advanced techniques like k-nearest neighbors or regression-based methods, can be employed depending on the nature of the data and the extent of missingness.
Interquartile Range: The interquartile range (IQR) is a statistical measure that represents the difference between the upper quartile (Q3) and the lower quartile (Q1) of a dataset. It effectively captures the spread of the middle 50% of data points, helping to identify variability and outliers. Understanding the IQR is crucial for data preprocessing and feature engineering, as it informs decisions on data normalization, transformation, and outlier detection.
Isolation Forests: Isolation forests are a type of anomaly detection algorithm that works by isolating observations in a dataset. The key idea behind this method is that anomalies, or outliers, are less frequent and tend to be easier to isolate than normal observations. By constructing a random forest of trees and measuring how quickly data points can be isolated, this technique can effectively identify outliers and provide insights into the underlying data distribution, which is crucial for tasks like data cleaning and quality assurance.
Label encoding: Label encoding is a technique used to convert categorical variables into numerical values, assigning each unique category a specific integer. This process is essential in data preprocessing as many machine learning algorithms require numerical input to function properly. By transforming categories into numbers, label encoding helps maintain the ordinal relationship between categories, when applicable, while simplifying the data for analysis.
Lag Features: Lag features are specific variables created in time series data that represent past observations, essentially allowing models to utilize previous values to predict future outcomes. These features help capture temporal dependencies, which are crucial for time-based predictions and improve the accuracy of forecasting models by enabling them to recognize patterns over time.
Local Outlier Factor: The Local Outlier Factor (LOF) is an algorithm used for identifying outliers in data based on the density of data points in the local neighborhood. It assigns a score to each point, which indicates how isolated or anomalous it is compared to its surrounding data points. This method is particularly useful in datasets with varying densities, as it can effectively differentiate between local outliers and global outliers.
Log Transformation: Log transformation is a mathematical technique used to convert a skewed distribution into a more normal distribution by applying the logarithm function to the data values. This process is particularly useful in data preprocessing and feature engineering, as it helps stabilize variance, makes patterns more discernible, and meets the assumptions required for many statistical methods and machine learning algorithms.
Machine learning techniques: Machine learning techniques refer to a set of algorithms and statistical methods that enable computers to learn from and make predictions based on data without being explicitly programmed. These techniques are essential for analyzing patterns, making decisions, and improving performance as more data becomes available, thus forming the backbone of many advanced applications in various fields, including business.
Mathematical Transformations: Mathematical transformations are operations that modify the structure or representation of data to enhance its interpretability and usability in analysis. These transformations can involve scaling, shifting, rotating, or applying more complex functions to data points, allowing for better feature representation in modeling. They play a crucial role in data preprocessing and feature engineering by optimizing input data for algorithms, making it easier to uncover patterns and improve predictive accuracy.
Mean imputation: Mean imputation is a statistical technique used to fill in missing values in a dataset by replacing them with the mean of the available data points. This method helps maintain the overall dataset size and can be useful for ensuring that analyses do not lose valuable information due to missing data. While it is straightforward and quick to implement, mean imputation can sometimes lead to biased estimates and reduced variability in the dataset.
Min-max scaling: Min-max scaling is a data normalization technique that transforms features to lie within a specified range, typically [0, 1]. This method ensures that the minimum value of a feature maps to 0 and the maximum value maps to 1, making it easier to compare different features on a similar scale. By doing so, min-max scaling helps improve the performance of machine learning algorithms that rely on distance calculations or gradient-based optimization.
Missing data: Missing data refers to the absence of values in a dataset, which can occur for various reasons such as errors in data collection, non-responses in surveys, or data corruption. This absence can significantly impact data analysis and machine learning models, as they rely on complete datasets to produce accurate insights and predictions. Addressing missing data is crucial in data preprocessing and feature engineering to ensure the integrity and usability of the data.
Multiple imputation: Multiple imputation is a statistical technique used to handle missing data by creating several different plausible datasets and analyzing them separately. This method improves the accuracy of estimations by incorporating the uncertainty associated with missing values, leading to more robust results in data preprocessing and feature engineering.
Non-linear relationships: Non-linear relationships refer to connections between variables that do not follow a straight line when graphed. These relationships can take various forms, such as quadratic, exponential, or logarithmic patterns. Understanding non-linear relationships is crucial because they can reveal complex interactions between features that linear models might overlook, making them essential in predictive modeling and data analysis.
Normalization: Normalization is a data preprocessing technique used to scale numeric data into a specific range, typically between 0 and 1, or to standardize it to have a mean of 0 and a standard deviation of 1. This process helps in ensuring that different features contribute equally to the distance calculations in algorithms, which is particularly important in machine learning models that rely on distance measures, such as k-nearest neighbors or support vector machines.
Numerical feature engineering: Numerical feature engineering involves the creation, transformation, and selection of numerical variables in a dataset to improve the performance of machine learning models. This process is crucial for preparing data in a way that helps algorithms better understand patterns, relationships, and insights. Effective numerical feature engineering can lead to improved accuracy, reduced training time, and enhanced model interpretability, making it an essential part of data preprocessing and feature engineering.
One-hot encoding: One-hot encoding is a technique used to convert categorical variables into a numerical format that machine learning algorithms can work with. This method represents each category as a binary vector, where only one element is 'hot' (set to 1) and all other elements are 'cold' (set to 0). This transformation helps preserve the information in categorical data while avoiding the pitfalls of assigning arbitrary numerical values that could imply an undesired ordinal relationship between categories.
Outlier Detection: Outlier detection is the process of identifying data points that significantly deviate from the majority of data within a dataset. These outliers can indicate anomalies, errors, or interesting variations that could provide valuable insights for data analysis. Detecting outliers is crucial during data preprocessing as they can skew statistical analyses and mislead machine learning models, making it important to address them appropriately in feature engineering.
Outlier Treatment: Outlier treatment involves identifying and managing data points that significantly differ from the rest of the dataset. These outliers can distort statistical analyses and lead to inaccurate model predictions, so it’s essential to address them during data preprocessing and feature engineering. By effectively treating outliers, you can enhance the quality of your data, leading to more reliable machine learning outcomes.
Oversampling: Oversampling is a technique used in data preprocessing to increase the number of instances in a minority class within an imbalanced dataset. This approach helps to create a more balanced representation of classes, ensuring that machine learning algorithms can learn effectively from all classes without being biased towards the majority. By generating synthetic samples or duplicating existing ones, oversampling aims to enhance the model's performance, particularly in classification tasks where the minority class is of significant interest.
PCA: Principal Component Analysis (PCA) is a statistical technique used to simplify complex datasets by reducing their dimensionality while preserving as much variance as possible. It transforms the original variables into a new set of uncorrelated variables called principal components, ordered by the amount of variance they capture from the data. This process is essential for data preprocessing and feature engineering, as it helps in identifying the most important features and reducing noise.
Polynomial features: Polynomial features refer to the transformation of input variables in a dataset into their polynomial combinations, allowing for the modeling of complex relationships in machine learning. By adding these higher-degree terms, polynomial features enable algorithms to capture non-linear patterns that linear models might miss, improving predictive performance and flexibility.
Reliability: Reliability refers to the consistency and dependability of a measurement or data source over time. In the context of data preprocessing and feature engineering, reliability is crucial because it impacts the quality of the data used for training machine learning models. High reliability ensures that the features extracted from data consistently represent the underlying phenomena, leading to more accurate and trustworthy predictions.
Rolling statistics: Rolling statistics refer to a method of calculating statistical measures, such as mean, median, variance, or standard deviation, over a moving window of data points. This technique allows for the analysis of trends and patterns in time series data by continuously updating the statistics as new data becomes available. Rolling statistics are essential in understanding how data behaves over time and can help in identifying temporal patterns or anomalies.
Scaling: Scaling refers to the process of adjusting the range of features in a dataset, making them uniform and comparable for machine learning algorithms. This is crucial because many algorithms perform better when the features are on a similar scale, ensuring that no single feature dominates or skews the results. Proper scaling can significantly enhance the performance and accuracy of models by improving convergence speed and reducing computational complexity.
Seasonal components: Seasonal components refer to the predictable patterns or fluctuations in data that occur at regular intervals due to seasonal factors, such as weather changes, holidays, or economic cycles. These components are crucial in time series analysis as they help in understanding and forecasting trends by distinguishing between short-term variations and longer-term patterns.
Smote: Smote refers to a technique used in data preprocessing that aims to address class imbalance in datasets by oversampling the minority class. This method works by creating synthetic examples of the minority class to improve the performance of machine learning models, particularly when dealing with classification tasks where one class is underrepresented.
Standardization: Standardization is the process of transforming data to have a mean of zero and a standard deviation of one. This technique is crucial in preparing datasets for analysis, especially when different features have different units or scales, ensuring that no single feature dominates the learning process of algorithms. By applying standardization, it becomes easier to compare features and improve the performance of machine learning models.
Statistical Methods: Statistical methods refer to the collection, analysis, interpretation, presentation, and organization of data through mathematical formulas and techniques. These methods are essential in making sense of large datasets, allowing businesses to derive insights and inform decision-making processes based on empirical evidence. In the realm of data preprocessing and feature engineering, statistical methods play a critical role in understanding data distributions, cleaning data, and selecting or transforming features for predictive modeling.
Synthetic data generation: Synthetic data generation is the process of creating artificially generated data that mimics real-world data without revealing any personal or sensitive information. This method is useful for training machine learning models, testing software, and conducting simulations while addressing privacy concerns and data scarcity. By generating synthetic datasets, organizations can avoid potential biases and improve the robustness of their AI systems.
T-SNE: t-SNE, or t-distributed Stochastic Neighbor Embedding, is a machine learning algorithm used for dimensionality reduction that helps visualize high-dimensional data in lower dimensions, typically two or three. This technique is particularly effective at preserving the local structure of the data while revealing global structures like clusters, making it an essential tool for data preprocessing and visualization tasks.
Text feature engineering: Text feature engineering is the process of transforming raw text data into numerical representations that machine learning algorithms can understand and use for analysis. This involves various techniques such as tokenization, stemming, lemmatization, and the creation of features like term frequency-inverse document frequency (TF-IDF) or word embeddings. By refining text data into structured formats, text feature engineering enhances the performance and accuracy of models used in natural language processing tasks.
Tf-idf: TF-IDF, or Term Frequency-Inverse Document Frequency, is a numerical statistic used to evaluate the importance of a word in a document relative to a collection of documents or corpus. It helps in identifying which words are significant within a specific text by balancing how frequently they appear in the text (term frequency) against how common they are across all documents (inverse document frequency). This balance is crucial in data preprocessing and feature engineering as it aids in transforming raw text into meaningful features for machine learning models.
Time series feature engineering: Time series feature engineering is the process of transforming raw time series data into a format that is more suitable for machine learning models. This involves creating new features based on existing data, such as lagged variables, rolling statistics, and seasonal indicators, to capture the underlying patterns and trends over time. Properly engineered features can significantly enhance model performance by providing richer information about temporal dependencies.
Undersampling: Undersampling is a technique used in data preprocessing to balance the distribution of classes in a dataset by reducing the number of instances in the majority class. This approach is often employed when dealing with imbalanced datasets, where one class significantly outnumbers another, which can lead to biased predictive models. By removing some instances from the majority class, undersampling aims to create a more equitable dataset for training machine learning models.
Word embeddings: Word embeddings are numerical representations of words that capture their meanings and relationships in a continuous vector space. This technique allows words with similar meanings to be positioned close together in that space, facilitating better understanding and processing of natural language in AI applications. By transforming words into vectors, word embeddings play a crucial role in improving the performance of various AI algorithms, especially those that involve text data.
Word2vec: word2vec is a technique used in Natural Language Processing that transforms words into numerical vector representations, allowing algorithms to understand the relationships between words based on their contexts. This method captures semantic meanings, enabling tasks such as similarity measurement and clustering of words. It’s particularly useful in data preprocessing and feature engineering, as it converts raw text into meaningful numerical formats that can be processed by machine learning models.
Z-score: A z-score is a statistical measurement that describes a value's relationship to the mean of a group of values, representing how many standard deviations a data point is from the mean. This metric helps in identifying outliers and comparing different data sets with different means and standard deviations, making it essential for data preprocessing and feature engineering.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.