During maximum-likelihood estimation of a d-dimensional multivariate Gaussian (\mathcal(\mu, \Sigma)), you need the gradient of the logarithmic normalization term
[f(\Sigma)=\log|\det \Sigma|]
with respect to the symmetric positive-definite covariance matrix (\Sigma). Which expression gives the correct matrix derivative (\partial f / \partial \Sigma)?
Jacobi's formula states that the differential of the determinant is (d\det A = \det(A),\text(A^{-1}dA)). Applying the chain rule to (\log \det \Sigma) yields [d\bigl(\log \det \Sigma\bigr)=\frac{1}{\det\Sigma},d\det\Sigma=\text(\Sigma^{-1}d\Sigma).] Because the trace inner product (\text(X^{\top}Y)) identifies the gradient, the matrix that satisfies (\text\bigl((\partial f/\partial\Sigma){\top}d\Sigma\bigr)=\text(\Sigma{-1}d\Sigma)) for every perturbation (d\Sigma) is (\Sigma^{-1}). For a symmetric positive-definite matrix, (\Sigma^{-1}=\Sigma^{-!T}), so the gradient simplifies to (\Sigma^{-1}). The other options either ignore the transpose, scale by an incorrect constant, or mistake the role of the determinant.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a symmetric positive-definite matrix?
Open an interactive chat with Bash
What does \(\log \det \Sigma\) represent in relation to Gaussian distributions?
Open an interactive chat with Bash
What does Jacobi's formula state, and how is it used here?