In AP Computer Science Principles, correlation is a relationship found in data where two variables change together in a consistent pattern. Per EK DAT-2.A.3, a correlation does not necessarily mean one variable causes the other; additional research is needed to find the true relationship.
Correlation is a pattern you can pull out of data when two variables tend to move together. As one goes up, the other consistently goes up (positive correlation) or consistently goes down (negative correlation). When you process a dataset digitally, like running daily temperature readings against precipitation amounts, correlations are one of the main kinds of information you can extract. The CED says this directly: information is the collection of facts and patterns extracted from data, and digitally processed data may show correlation between variables.
Here's the part AP CSP actually cares about. Finding a correlation is the beginning of an investigation, not the end. EK DAT-2.A.3 spells it out: a correlation found in data does not necessarily indicate that a causal relationship exists. Two variables can move together because of coincidence, because of a third hidden variable, or because the data themselves are messy or biased. The classic example is the strong negative correlation between the number of pirates and global temperatures over 300 years. The pattern is real in the data, but pirates obviously don't control the climate. To understand what's actually going on, you need additional research, and often you need to combine data from multiple sources, since a single dataset usually can't support a full conclusion on its own.
Correlation lives in Topic 2.3 (Extracting Information from Data) in Unit 2: Data, under learning objective AP Comp Sci P 2.3.A (describe what information can be extracted from data). The supporting essential knowledge, especially EK DAT-2.A.3 and EK DAT-2.A.4, gives you the two moves the exam expects. First, recognize that finding patterns and trends is exactly what data analysis is for. Second, stay skeptical about what those patterns mean. AP CSP is partly a course about reasoning responsibly with computing, and the correlation-versus-causation distinction is one of the clearest places that shows up. If a question describes a dataset and a discovered relationship, the exam is almost always probing whether you'll jump to a causal conclusion or correctly say more investigation is needed.
Keep studying AP Computer Science Principles Unit 2
Causation (Unit 2)
This is the pairing the exam loves. Correlation says two things move together; causation says one actually makes the other happen. EK DAT-2.A.3 exists specifically to stop you from treating the first as proof of the second.
Scatter Plot (Unit 2)
A scatter plot is how you usually see a correlation. Plot two variables against each other, and a correlation shows up as the points clustering into an upward or downward trend instead of a random cloud.
Cleaning Data (Unit 2)
Correlations are only as trustworthy as the data behind them. Incomplete, invalid, or non-uniform data (EK DAT-2.C.2) can create fake correlations or hide real ones, which is why cleaning data comes before drawing conclusions.
data bias (Unit 2)
A biased dataset can produce a correlation that reflects how the data were collected rather than how the world works. That's one more reason a correlation alone can't justify a conclusion, especially when the stakes are real, like public policy.
Correlation shows up in multiple-choice questions, and the trap is always the same. A question describes a study or dataset where two variables move together, then asks what you can conclude. The credited answer is the cautious one. Practice questions in this style include a strong negative correlation between pirate populations and global temperatures (the insight is that correlation doesn't imply causation), and a question about using big data for public policy, where the primary concern with relying only on correlational findings is that you might act on a relationship that isn't causal. Another common stem asks which approach gives the most reliable conclusion about a suspected relationship, like social media use and grades. The answer points toward additional research and combining multiple data sources (EK DAT-2.A.4), not trusting a single correlation. Your job on these questions is to extract the pattern, name it as a correlation, and refuse to upgrade it to causation without more evidence.
Correlation means two variables change together in a pattern. Causation means a change in one variable actually produces a change in the other. Every causal relationship shows up as a correlation, but plenty of correlations are not causal. They can come from coincidence, a third lurking variable, or biased data. AP CSP's EK DAT-2.A.3 is blunt about this: a correlation found in data does not necessarily indicate a causal relationship, and additional research is needed to understand the exact nature of the relationship. When an MCQ tempts you with an answer like 'X causes Y' based on a dataset alone, that answer is almost certainly wrong.
Correlation is a pattern in data where two variables move together consistently, either in the same direction (positive) or opposite directions (negative).
EK DAT-2.A.3 is the testable core: a correlation found in data does not necessarily mean a causal relationship exists.
Finding the true nature of a relationship requires additional research, and often combining data from multiple sources, since one dataset usually isn't enough to draw a conclusion (EK DAT-2.A.4).
Messy data can fake or hide correlations, so cleaning data and checking for bias come before trusting any pattern you find.
On the AP exam, the right answer to 'what can we conclude from this correlation?' is almost always the careful one, not the causal claim.
Correlation is a relationship extracted from data where two variables change together in a consistent pattern. It's covered in Topic 2.3 (Extracting Information from Data) in Unit 2, under EK DAT-2.A.3.
No. EK DAT-2.A.3 states directly that a correlation found in data does not necessarily indicate a causal relationship, and additional research is needed to understand the actual relationship. This is one of the most reliably tested ideas in Unit 2.
Correlation means two variables move together; causation means one actually causes the other to change. The pirates-and-global-temperatures example shows a strong negative correlation with zero causation, since pirates obviously don't control climate.
By digitally processing a dataset, often by plotting two variables on a scatter plot and looking for a consistent trend. For example, comparing 10 years of daily temperature and precipitation readings could reveal whether the two move together.
Because the correlation might come from coincidence, a hidden third variable, or biased data rather than a real causal link. That's the primary concern with using correlational findings alone to inform public policy, and the CED says a single source often can't support a conclusion on its own.