A machine-learning team is performing exploratory analysis on a regression dataset that contains 150 numeric predictors gathered from IoT sensors. They need a single visualization that will (1) display the absolute Pearson correlation coefficient for every pair of predictors, (2) automatically reorder the predictors so that highly correlated variables appear next to each other, and (3) make it easy to spot rectangular blocks of strong correlation that could indicate multicollinearity. Which chart is the most appropriate choice for this task?
A scatter-plot matrix (pair plot) of all predictor pairs
A multi-line time-series chart showing the variance of each predictor across observations
A series of box-and-whisker plots, one per predictor, ordered by median value
A clustered correlation heat map with hierarchical clustering applied to rows and columns
A clustered correlation heat map displays the full matrix of pair-wise correlation coefficients with color-encoded magnitude, directly satisfying requirement 1. Applying hierarchical clustering to both rows and columns reorders the matrix so that predictors with similar correlation profiles are grouped, forming visible blocks along the diagonal; this addresses requirements 2 and 3 by making correlated feature groups immediately apparent. A scatter-plot matrix can also reveal correlations, but becomes unwieldy with 150 predictors and does not automatically group related variables. Univariate box-and-whisker plots and multi-line variance charts show distribution or temporal variance, not pair-wise correlation, so they cannot expose multicollinearity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Pearson correlation coefficient?
Open an interactive chat with Bash
What is multicollinearity in a dataset?
Open an interactive chat with Bash
How does hierarchical clustering help in a correlation heat map?