A data scientist is constructing a feature matrix where the existing feature vectors are linearly independent. A new feature vector is engineered, which is a linear combination of two of the original vectors. This new vector is then appended as a new column to the matrix. Which statement correctly describes the primary consequence of this action on the properties of the feature matrix?
The new vector replaces one of the original vectors in the basis, resolving a deficient rank problem in the original matrix.
The span of the column space is unaffected, which improves the numerical stability of subsequent model coefficient estimations.
The span of the column space remains unchanged, but perfect multicollinearity is introduced.
The span of the column space expands to a higher dimension because an additional vector has been introduced.
The correct answer is that the span of the column space remains unchanged, but this introduces perfect multicollinearity into the model. The span of a set of vectors is the set of all possible linear combinations of those vectors. Since the new feature vector is explicitly created as a linear combination of existing vectors, it already lies within the original span and does not add any new dimensions to the column space. The direct consequence of adding a linearly dependent feature vector is the introduction of perfect multicollinearity. This condition can destabilize linear models, making coefficient estimates unreliable and non-unique. The other options are incorrect because the span does not expand, the addition of a dependent vector degrades rather than improves model stability, and a new basis is not necessarily formed while ignoring the more critical issue of multicollinearity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does 'perfect multicollinearity' mean in linear modeling?
Open an interactive chat with Bash
What is the column space of a matrix?
Open an interactive chat with Bash
How does linear dependence affect the numerical stability of a linear model?