A data scientist is studying the association between years of professional experience (X) and an annual job-performance index (Y) for 350 software engineers. The scatterplot shows:
Y rises as X increases, but the slope flattens after about 10 years, producing a positive monotonic yet visibly nonlinear trend.
Several extreme performers (both unusually high and low Y for their X) appear as outliers.
The analyst wants a single summary statistic that (1) quantifies the direction and strength of the relationship without assuming linearity and (2) reduces the influence of the outliers by working with ranks rather than raw values.
Which statistic best meets these requirements?
Spearman correlation coefficient
Pearson product-moment correlation coefficient
Log-transform the performance index and then compute the Pearson correlation
Fit a quadratic regression model and report its R-squared value
The Spearman correlation coefficient is the most appropriate because it converts the data to ranks and measures the strength and direction of a monotonic relationship. By operating on ranks, it greatly limits the effect of extreme values and does not require a linear pattern.
The Pearson product-moment correlation assumes linearity and uses raw scores; it is therefore distorted by both the nonlinear trend and the outliers.
Reporting the R-squared from a quadratic regression summarizes model fit, not the monotonic association itself, and remains sensitive to influential points.
Log-transforming Y before computing Pearson correlation may straighten part of the curve, but the result still relies on a linear model of the transformed data and remains susceptible to outliers, failing to meet the analyst's stated goals.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does a Spearman correlation coefficient measure?
Open an interactive chat with Bash
Why are ranks used in Spearman correlation instead of raw values?
Open an interactive chat with Bash
What is the difference between a monotonic and linear relationship?