A data scientist implements a multilayer perceptron with three hidden layers, but mistakenly sets every neuron's activation function to the identity mapping f(x)=x instead of a non-linear function such as ReLU. After training, the network behaves exactly like a single-layer linear regression, regardless of how many hidden units it contains. Which explanation best describes why the network loses expressive power in this situation?
Identity activations implicitly impose strong L2 regularization on the weights, preventing the model from fitting non-linear patterns.
Identity activations force all bias terms to cancel during forward propagation, eliminating the offsets needed for non-linear decision boundaries.
Using identity activations makes every weight matrix symmetric and rank-deficient, restricting the network to learn only linear relationships.
Composing purely affine transformations (weights and bias) produces another affine transformation, so without a non-linear activation every layer collapses into one overall linear mapping of the inputs.
Each artificial neuron normally performs two operations: an affine transformation (weights · input + bias) followed by a non-linear activation. If the activation is the identity function, every layer is reduced to an affine mapping. The composition of affine mappings is itself another affine (linear) mapping, so the whole network collapses to a single linear function of the inputs. Without a non-linear activation, the model cannot create curved decision boundaries or approximate complex functions. The other statements are incorrect: identity activations do not force biases to cancel, do not make weight matrices symmetric, and do not apply implicit L2 regularization; these factors do not explain the observed linear behavior.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is non-linearity important in neural networks?
Open an interactive chat with Bash
What happens when using identity activation functions in multilayer networks?
Open an interactive chat with Bash
What is the role of activation functions in neural networks?