AP Statistics
11 min read•Last Updated on July 11, 2024
Jerry Kosoff
Jerry Kosoff
The FRQ is a great way to prep for the AP exam! Review FRQ practice writing samples from Unit 5 and corresponding feedback from Fiveable teacher Jerry Kosoff.
A researcher in Yellowstone National Park observed the “Old Faithful” geyser for several weeks. For each eruption of the geyser, the duration from start to end, in seconds, was recorded. The histogram below summarizes the results from 421 observations. The mean of the distribution is 210 seconds, with a standard deviation of 68 seconds.
a. Describe the sampling distribution of sample mean eruption length for random samples of 40 eruptions from the researcher’s observations.
b. What is the probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less?
a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions has a mean of 210 seconds, and it is bimodal with peaks at 100-125 seconds and 250-275 seconds. The shape of the sampling distribution of sample mean eruption length seems to be roughly symmetrical, and the range of the sampling distribution of sample mean eruption length for random samples of 40 eruptions is no more than 250 seconds.
b) x = mean geyser eruption duration for a random sample of 40 eruptions
Conditions: Random - stated that there were random samples of 40 eruptions, 10% Rule for Independence - satisfied since there are at least 421(10) = 4210 observations of geyser eruptions, Normal/Large Sample - satisfied since n =40 >= 30; therefore, the sampling distribution of sample mean eruption duration is approximately normal.
P(x<200) = P(z<-0.14) = .4443 <–from Table A using z-score of -0.14
z = (200-210)/68 = -0.14
[pretend i drew a picture of a normal distribution here with 210 as median, 200 slightly to left of it, and everything shaded below -0.14]
The probability that the sample mean eruption duration for a random sample of 40 eruptions is 200 seconds is less is 0.4443.
In part (a), you appear to misunderstand what you’re being asked to describe. You describe the distribution provided by the histogram. However, the histogram is really providing the distribution of the “population” in this scenario; we are being asked to describe what it would look like if we took repeated samples of 40 eruptions from the graph shown and create a new graph of x-bars. Since 40 > 30, the shape of the original distribution (the one you described) doesn’t matter; the Central Limit Theorem applies and we can describe the resulting sampling distribution as approximately normal, with a mean of 210 seconds and a standard error of 68/sqrt(40) seconds [using formulas from our formula sheet].
The misconception in part (a) then extends to part (b) - you use the “original” standard deviation when we should instead use the standard error of 68/sqrt(40) = 10.751 seconds. This impacts the z-score you would get an ultimately your associated probability. In previous rubrics, you would still get partial credit for calculating the probability that you did, because you did all of your work correctly given the mistake you made.
Another small thing: you had the right idea checking the 10% condition, but used the wrong numbers. We should be comparing 40 to 421 (and since 40(10) = 400 < 421, the condition is still met). I am unsure whether that would be penalized on a typical rubric.
a) The population is approximately normal, the value of n (40) is >= 30 by the Central Limit Theorem, and the sample shows no strong skew or outliers. The center of the sample mean is 210 seconds. The variability is 10.752 seconds because 68/sqrt(40) = 10.752.
b) P(x<=200) = ?
Use the z = (x-bar - mu)/(standard deviation/sqrt(n)) equation
(200-210)/10.752 = -0.93
Using Table A, a z-score of -0.93 is a p-value of 0.1762.
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.1762.
In part (a), you correctly describe the shape, center, and spread of the sampling distribution, citing the Central Limit Theorem as the reason for the distribution being “approximately normal.” What you should be careful with is that you start by saying “the population is approximately normal”, when it’s the sampling distribution that is approximately normal. Unfortunately, that would sometimes be enough to lower your score by a level (from fully correct to partially correct); you’ve used the wrong statistical term.
In part (b), you do a good job of communicating the probability you are asked to find, then carry out calculations correctly and answer in context. Nice job!
a.) The sampling distribution is approximately normal because according to the Central Limit Theorem, if the sampling size (40) is greater than 30, the shape is approximately normal. The mean of the sampling distribution is 210 seconds. The standard deviation of the sampling distribution is 68/sqrt(40)=10.75.
b.) P(x<200)= P(z<-0.93)= .1762
z= P(200-210)/10.75 (from part a)=-.93
There is a 17.62% chance that the sample mean eruption length for a random sample of 40 eruptions in 200 seconds or less.
Good on both parts! In part (a), you give correct descriptions for shape, center, and spread, and correctly invoke the Central Limit Theorem since n = 40 > 30. In part (b), you calculate the correct probability. Small note on notation: P(X < 200) should be P(x-bar < 200). I know that we can’t format “x-bar” on here, but you were asked about a sample mean so we need to use the appropriate symbol. That actually could be enough to bump you down a scoring level, so watch your symbols/notation carefully.
A. The sample is approximately normal due to the Central Limit Theorem (40 is greater than 30). There seems to contain not outliers. For the center, the mean of the distribution is 210. And for the spread, the standard deviation is 10.75 ( 68/ square root of 40).
B. Require Assumptions:
Sampling: There is a random sample of 40 eruptions.
Normally Distributed: 40 is greater than 30 therefore it meets the Central Limit Theorem so we can assume approximately normal.
Independence: 10(40) is less than all geyser eruptions.
The mean is 210. Standard deviation is 10.75 ( 68/ square root of 40). I then proceeded to find the z-score: 200-210/10.75=-.9302.
The probability statement is P(z is less than or equal to -.9302).
Then using my calculator I did normalcdf(-1000,-.9302,0,1) and found the p-value which is .1761.
To conclude, there’s a 17.61% chance that a random sample of 40 eruptions is 200 seconds or less.
Also I would have added a sketch to show the distribution
Well done! You’ve correctly invoked the CLT in part (a) to justify your shape being approximately normal, while giving correct measures of center and spread. Be careful - your first two words are “the sample” instead of “the sampling distribution” - there’s a big difference in those two things. There are no issues in part (b) - nice job!
For part (a), you correctly identify the shape as “approximately normal” due to the CLT (and give the correct reason, n = 40 > 30). However, a description of a distribution (of any type) should include measures of center and spread to go with shape. (Many teachers use “S.O.C.S.” or “C.U.S.S.” as acronyms to help students remember - Shape/Outliers/Center/Spread or Center/Unusual Features/Shape/Spread). In this case, you did not mention the mean of the sampling distribution (which would still be 210) or the standard error (which would be 68/sqrt(40) = 10.75). This then impacted your probability calculation in part (b). You correctly used z-scores and did a correct calculation for what your z-score was, but would only earn partial credit from not calculating the standard error in part (a)
a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions from the researcher’s observations is approximately normal(random samples of 40 eruptions > 30; Central Limit theorem). The distribution has a mean(center) of 210 sec and and standard deviation (spread) of 10.75 sec ( 68/ sqrt 40).
b) The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.176.
Conditions: Random: Random sample of 40 eruptions was taken.
Independent: Random sample of 40 eruptions is less than 10% of the population; 400<421
Normal: 40 samples > 30 ; Central Limit Theorem is satisfied
Calculator: normalcdf[ Lower:0, Upper: 200, u: 210, st. dev. : 10.75] = 0.1761
Perfect all around!
a) The distribution eruption length is approximately normal as n>30 with a range of 225 seconds, a mean of 210 seconds, and a standard deviation of ~10.7517 seconds.
Do I need to mention outliers and range here?
b) According to the central limit theorem, a sample of n>=30 so our sample of 40 tells us this is approximately normal. Also, 40*10 is less than the population of 421 and we are told the sample is random.
normCdf(lower=-1e99,upper=200,μ=210,σ=10.751744)=0.176164
part (a) has everything needed to describe a distribution (center, shape, and spread are mentioned, so no need for outliers/range), though you should show where the 10.75 seconds calculation came from. Part b, you’ve done all appropriate calculations.
a. The sampling distribution of 40 random samples of eruption would be approximately normal. The distribution of 40 random samples would be centered at the mean of 210. The shape of the distribution would be bell-shaped and approximately symmetrical. The sampling distribution would be spread with a standard deviation of 68/sqrt(40) = 10.752. The sampling distribution would not have any unusual features or gaps.
b. Assumptions:
-We have a random sample of geyser eruptions.
-Population of eruptions is at least 400.
-Since the sample size is large enough (n>30) due to CLT, the sampling distribution is approximately normal
-Sigma_d is known
Calculations
p(x_bar ≤ 200) = normalcdf(-1E99, 200, 210, 10.752) = 0.1762
Conclusion
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.1762.
Solid work! The only possible issue: in part (a), you mention the shape as “approximately normal”, but don’t give the reason for that until part (b) (since n = 40 > 30, the CLT applies). On some rubrics, we’d be able to give you retroactive credit for part (a) based on that description in part (b), but it’s always safe to show that the CLT applies whenever you’re citing a sampling distribution of x-bar being approximately normal.
a. The sampling distribution of sample mean eruption length for random samples from the researchers observations is approximately normal 40>30 so CLT applies, it is centered at mu=210 and the samples were chosen randomly and the spread is 68/sqrt40=10.75 and the sample of 40(10)=400 which is less than the 421 total observations.
b. 200-210/10.75=.93
p=.1762
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is .1762.
Perfect! You mention center, shape, and spread in part (a) - correctly applying the CLT - and do appropriate calculations in part (b).
a. The sampling distribution of the sample mean eruption length for random samples of 40 eruptions from the researcher’s observations can be described as approximately normal (by the Central Limit Theorem, as n = 40 >30 and is thus sufficiently large) with a mean of 210 seconds and a standard deviation of 68/sqrt(40) = 10.7517 seconds (N(210, 10.7517))).
b. Let us define the continuous random variable X as N(210, 10.7517) (from the description of the sampling distribution of the sample mean eruption length in part a)
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is: P(X<200) = normcdf(lowerbound = -infinity, upperbound = 200, mu = 210, sigma = 10.7517) = 0.176164.
Perfect! You’ve correctly justified with the CLT and then done appropriate calculations in part (b).
a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions has a mean of 210 seconds and a standard deviation of 10.75 (68/sqrt40). It is normally distributed as the central limit theorem is applicable because n is greater than or equal to 30 (n=40). Because the sampling distribution is normally distributed, its shape can be described as symmetrical and the distribution shows no evidence of skews or outliers.
b) Conditions: There is a random sample of 40 eruptions. The sample is normally distributed because n>30 which meets the central limit theorem.
P(X <= 40)=?
z=200-210/(68/sqrt40)= -.93
P(z<= 200): normalcdf(-9999,-.93,0,1) = .176
The probability that the sample mean eruption length of a random sample of 40 eruptions is 200s or less is .176
Nice work. It’s a small thing, but it matters: when using the CLT we have to say approximately normal. t-distributions are never perfectly normal until we hit infinity as a sample size (which is of course impossible). It would result in partial credit on an otherwise perfect response.
a. Since the sample is large (n=40 above 30), we can approximate our sample distribution with a normal curve. Since the data is from a random sample, the population mean is the sample mean (210). We can assume there are more than 10(40)=400 eruptions (10% rule). We have the sample standard deviation equal to population standard deviation divided by the square root of n. We have the sample standard deviation of 68/sqrt(40)=10.752.
b. We let x be the mean duration of a random sample of 40 eruptions. We want to find P(x≤200), which is equivalent to P(z≤(200-210)/10.752=-0.93). We use normal cdf with a lower bound of -∞, a higher bound of -0.93, a mean of 0, and a standard deviation of 1, and we get 0.176. The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.176.
Correct on both parts. While you don’t say “Central Limit Theorem” in part (a), you justify the shape (approximately normal) based on the sample size, so that will count.