Fiveable

📊AP Statistics Review

QR code for AP Statistics practice questions

AP Stats Mixed Units Practice FRQ #4 & Feedback

Practicing with FRQs is a great way to prep for the AP exam! Review student responses for a FRQ combining multiple units and corresponding feedback from Fiveable teacher Jerry Kosoff.

The Mixed Units FRQ

A researcher in a city with a large public train system wondered if the rent prices of one-bedroom apartments was related to the distance from the nearest train station. From a list of 250 similarly-sized one-bedroom apartments in the city, the researcher selected a simple random sample of 20 apartments. The researcher then measured the walking distance, in minutes, to the nearest train station and created a scatterplot comparing the walking distances to the advertised weekly rent, in dollars. A scatterplot and the output from computer regression software are shown below.

a. Explain a procedure by which the researcher may have selected the simple random sample.

b. Describe the association between walking distance from the nearest train station and weekly rent for the apartments included in the sample.

c. Interpret the value of the coefficient of determination (r-squared) in the context of this problem.

d. Before examining the data, a second researcher makes a prediction that for each additional minute of walking distance, the weekly rent will decrease by approximately \2. A 95% confidence interval for the slope of the regression line is constructed from the data, and is found to be (-2.471, -1.845). Does the confidence interval contradict the researcher’s claim? Justify your answer.

e. The second researcher wants to conduct a similar study to the first researcher. However, in the second researcher’s study, a mixture of one, two, and three-bedroom apartments were selected. Do you expect the value of r-squared for the second study to be greater than, less than, or equal to the value of r-squared in the first study? Justify your response.

FRQ Writing Samples and Teacher Feedback

Student Response 1

a. The researcher could have assigned each apartment in the sample a number from 1-250 and used a random number generator to choose 20 apartments, taking out any repeats.

b. There is a strong, negative linear correlation between the weekly rent and distance from the nearest train station. (should I say correlation or association?)

c. 92.1% of the variability of in the weekly rent dollars can be accounted for by the variability in the distance from nearest train station.

d. No it does not, because \2 falls within the given confidence interval.

e. I expect the r-squared value to be less than in the first study. Including one-, two-, and three-bedroom apartments would add variability in weekly rent that is not explained by distance to the train station alone. That extra unexplained variability would tend to weaken the linear relationship and increase the scatter around the regression line, so a smaller proportion of the variation in rent would be explained by walking distance.

Teacher Feedback

Little things: in (a), you must specify that the 20 numbers should be selected from the integers 1 through 250, with repeats ignored or skipped.

In (b), either word is usually fine, but “association” is the safer choice in a description question. If you use “correlation,” make sure the relationship is quantitative and roughly linear.

Part (c) is close, but AP-ready wording should explicitly say that 92.1% of the variability in weekly rent is explained by its linear relationship with walking distance to the nearest train station.

Part (d) is appropriate because the claimed slope of -2 dollars per minute is inside the 95% confidence interval.

Part (e) has the correct prediction, but the original reasoning should not say “lower residual.” Mixing apartment sizes would usually increase unexplained variation and increase scatter around the regression line, leading to a smaller r-squared.

Student Response 2

a. The researcher could have numbered the 250 similarly-sized apartments from 1-250 and wrote the numbers on an equally sized piece of paper. Then, the researcher could put these slips in a hat and shake it very well. Then the researcher can pick 20 slips of paper which would represent the 20 apartments that will be used in the sample.

b. The association between walking distances from the nearest train station and weekly rent for the apartments included in the sample is strong, negative, and linear with no suspected outliers. It is strong because the correlation is r = -sqrt(0.921) ≈ -0.9597 (negative because the slope of the regression line is negative), which indicates a strong negative linear relationship. We know the data points have a negative association as the slope of the linear regression line is -2.158.

c. The coefficient of determination means that 92.1% of the variability in weekly rent among the sampled apartments is explained by the linear relationship with walking distance to the nearest train station.

d. Since the confidence interval contains -2, the researcher’s claim is not contradicted.

e. We expect the r-squared to be less than the first study. This is because we will have more variability in the weekly rent since there are more types of apartments which vary in weekly rent. This leads us to have less of an association between the weekly rent and the train station.

Teacher Feedback

Nice work overall. Parts (a), (d), and (e) are solid.

In (b), be careful with the calculation: since the slope is negative, the correlation should also be negative, so use r=0.921r=-\sqrt{0.921}, not +0.921+\sqrt{0.921}.

In (c), this is the right idea, and the strongest version explicitly names both variables and the linear model: 92.1% of the variability in weekly rent is explained by its linear relationship with walking distance to the nearest train station.

Quick AP Stats FRQ Tips for Questions Like This

  • When describing a scatterplot, stick to direction, form, strength, and unusual features.
  • When interpreting r-squared, name the response variable, the explanatory variable, and say the variability is explained by the linear relationship/model.
  • When interpreting a confidence interval for slope, check whether the claimed slope value is inside the interval.
  • When predicting what happens to r-squared, think about whether the new study would create more unexplained variation around the regression line.

If you want, try rewriting parts (b), (c), and (e) yourself to make them fully exam-ready.