A data scientist is designing a Multilayer Perceptron (MLP) to model a highly complex, non-linear relationship within a dataset. The initial prototype, which is functionally equivalent to a simple perceptron with a linear activation function, exhibits high bias and is unable to capture the underlying patterns. To fundamentally enhance the model's capacity to learn these non-linear relationships, which architectural modification is the most critical?
Applying L2 regularization to the weights of the hidden and output layers.
Replacing the stochastic gradient descent (SGD) optimizer with an adaptive optimizer like Adam.
Introducing one or more hidden layers with non-linear activation functions.
Increasing the number of neurons in the input layer to match the number of features.
The correct answer is to introduce hidden layers with non-linear activation functions. A Multilayer Perceptron (MLP) without hidden layers or with only linear activation functions is mathematically equivalent to a simple linear model and cannot capture non-linear patterns. The Universal Approximation Theorem states that an MLP with at least one hidden layer and a non-linear activation function can approximate any continuous function, which is what gives it the power to model complex relationships.
Increasing the number of neurons in the input layer to match the feature count is a necessary step for setting up any neural network, but it does not add non-linear modeling capability.
Using an adaptive optimizer like Adam can speed up convergence and improve the training process, but it cannot change the fundamental representational capacity of the model's architecture. An optimizer finds the best parameters for a given architecture; it does not make a linear architecture non-linear.
L2 regularization is a technique to prevent overfitting by penalizing large weights, which helps with generalization. It is used to control complexity, not to create the capacity for non-linear modeling.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the role of hidden layers in a neural network?
Open an interactive chat with Bash
Why are non-linear activation functions important in an MLP?
Open an interactive chat with Bash
What is the Universal Approximation Theorem and how does it relate to MLPs?