A data scientist wants to report a two-sided 95% confidence interval for the true population Pearson correlation between two numerical features. In a random sample of n = 60 observations, the sample correlation is r = 0.58. To use standard normal critical values, which pre-processing step should be applied to the correlation estimate before constructing the confidence interval?
Transform r with Fisher's inverse hyperbolic tangent (z-transformation), build the interval in the transformed space, then back-transform the interval's endpoints.
Use a Box-Cox transformation on each variable so that the resulting correlation can be treated as normally distributed.
Apply the Wilson score method directly to r to obtain the interval.
Multiply r by √(n−2)/√(1−r²) and treat the result as standard normal when forming the interval.
Because the sampling distribution of Pearson's r is skewed and its variance depends on the unknown population correlation (ρ), a direct calculation using normal theory is inappropriate. Fisher's z-transformation-z = atanh(r) = ½ ln[(1+r)/(1−r)]-is a variance-stabilizing transform that makes the resulting statistic, z, approximately normally distributed as N(atanh(ρ), 1/(n−3)). A 95% interval for this transformed value is therefore z ± 1.96 / √(n−3). Applying the inverse transform (tanh) to the interval's endpoints yields the confidence interval for ρ. The Wilson score interval is designed for binomial proportions. A Box-Cox transformation applies to the raw data, not the correlation coefficient r. The statistic r√(n−2)/√(1−r²) follows a t-distribution and is used for hypothesis testing, not interval estimation.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Fisher's z-transformation and why is it used for correlations?
Open an interactive chat with Bash
Why can't the Wilson score method or Box-Cox transformation be used in this case?
Open an interactive chat with Bash
What is the role of sample size (n) in constructing the confidence interval for correlation?