A data scientist creates a Pearson correlation heat map for 20 continuous process‐control variables collected from an industrial line. In the plot, three separate pairs of predictors show absolute correlation coefficients greater than 0.95, whereas all remaining pairwise correlations are below 0.40. The analyst plans to build an ordinary‐least‐squares regression model and is worried about unstable coefficient estimates caused by multicollinearity. Based solely on the information provided by the correlation plot, which next step is the most appropriate before training the model?
Treat the highly correlated pairs as key drivers of the target variable and keep all predictors unchanged.
Replace the Pearson coefficients in the heat map with Spearman rank correlations, which will eliminate multicollinearity.
Remove one variable from each pair with |r| > 0.95 to reduce redundancy before fitting the regression model.
Standardize every predictor to zero mean and unit variance; this resolves multicollinearity without removing features.
When two predictors have an absolute Pearson correlation near 1, they convey almost the same linear information. Keeping both in a linear model inflates standard errors and can make coefficient signs and magnitudes unstable. The usual first remedy is to eliminate redundancy by retaining only one variable from each highly correlated pair (or otherwise combine them), thereby reducing multicollinearity risk. Standardizing variables rescales them but does not change their correlation structure, so multicollinearity remains. Switching from Pearson to Spearman correlation might be useful for ranking but will not remove the underlying linear dependency. Interpreting high predictor-predictor correlations as evidence about the dependent variable is incorrect because the plot does not involve the target at all.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is multicollinearity in regression models?
Open an interactive chat with Bash
Why is the Pearson correlation used for feature selection?
Open an interactive chat with Bash
How does standardization affect multicollinearity?