As part of implementing a single-output neural network layer, you define the pre-activation value as z = Wᵗx + b with W, x ∈ ℝⁿ and compute the prediction y_hat = σ(z), where σ is the logistic sigmoid. For one training sample you use the squared-error loss function
L = ½ (y_hat − y)²
Using the multivariate chain rule, which expression gives the gradient ∂L/∂W?
∂L/∂y_hat = y_hat − y (derivative of the ½ (y_hat − y)² term).
∂y_hat/∂z = y_hat (1 − y_hat) (derivative of the sigmoid activation).
∂z/∂W = x (because z = Wᵗx + b). Multiplying the three factors gives ∂L/∂W = (y_hat − y) y_hat (1 − y_hat) x. The correct option includes all three factors and preserves the sign. The other choices omit one of the factors or place terms in the wrong order, so they do not represent the full chain-rule derivative.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the multivariate chain rule?
Open an interactive chat with Bash
What is the logistic sigmoid function and why is it used?
Open an interactive chat with Bash
Why is the squared-error loss function appropriate here?