While preparing an offline batch solution for a multivariate linear-regression model, you build a design matrix X whose rows represent 500 observations and whose columns represent 20 standardized features. You plan to solve the normal equation β̂ = (XᵀX)⁻¹ Xᵀ y to obtain the coefficient vector.
Which statement correctly characterizes the matrix XᵀX and explains why a Cholesky factorization is an efficient way to compute β̂?
XᵀX is a 20 × 20 symmetric positive-definite matrix; exploiting this property with a Cholesky decomposition roughly halves the work compared with a general LU factorization.
XᵀX is a 500 × 500 matrix, so Cholesky is infeasible because it only works on triangular matrices, making LU the necessary choice.
After feature standardization, XᵀX always has determinant one, so the type of factorization has no impact on computational efficiency.
XᵀX is a 20 × 500 rectangular matrix that cannot be inverted, so any factorization method will fail.
Multiplying a 20-feature design matrix X (shape 500 × 20) by its transpose produces XᵀX with shape 20 × 20. Because XᵀX is the Gram matrix of X, it is symmetric, and-assuming X has full column rank-positive definite. Cholesky decomposition is tailor-made for such matrices; it leverages symmetry and positive definiteness to factor XᵀX into LLᵀ using about half as many floating-point operations as a general LU factorization. Therefore, the statement identifying XᵀX as a 20 × 20 SPD matrix and citing the computational benefit of Cholesky is correct, while the alternatives misstate the matrix dimensions, its invertibility, or the reason factorization choice matters.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is Cholesky decomposition efficient for symmetric positive-definite matrices?
Open an interactive chat with Bash
What is the shape and significance of the matrix XᵀX?
Open an interactive chat with Bash
What does symmetric positive-definite (SPD) mean for a matrix?