A machine learning engineer is training a Multilayer Perceptron (MLP) for a complex non-linear regression task. The model exhibits high bias, indicated by poor performance on both the training and validation sets. The engineer suspects the network's architecture lacks the necessary capacity to model the underlying function. Which of the following architectural changes is the most appropriate next step, and what is the fundamental reason for its effectiveness?
Increase the number of hidden layers to allow the network to learn a hierarchical composition of features, enabling it to approximate more complex functions.
Add a dropout layer with a high dropout rate after each hidden layer to ensure the model generalizes better.
Replace the non-linear activation functions in the hidden layers with linear functions to reduce computational complexity and simplify the model's learning process.
Decrease the number of neurons in each existing hidden layer to enforce the principle of Occam's razor and create a more parsimonious model.
The correct answer is to increase the number of hidden layers. A model with high bias is underfitting the data, which means it is too simple to capture the underlying patterns. Adding more hidden layers (increasing the model's depth) is a primary strategy to increase model complexity and capacity. The fundamental reason this is effective is that each successive hidden layer can learn more complex and abstract features by building upon the representations learned by the preceding layers. This process, known as hierarchical feature learning, is what gives deep neural networks their power to approximate highly complex, non-linear functions.
The option to add dropout is incorrect. Dropout is a regularization technique used to combat overfitting (high variance), not underfitting (high bias). Applying aggressive dropout to an underfitting model would likely reduce its capacity further and worsen the problem. While some advanced techniques like 'early dropout' exist, standard dropout is a countermeasure for overfitting.
The suggestion to replace non-linear activation functions with linear ones is fundamentally incorrect. A sequence of linear transformations is mathematically equivalent to a single linear transformation. This change would cause the entire multi-layer network to collapse into a simple linear model, drastically reducing its capacity to learn non-linear relationships and making the underfitting severe.
Decreasing the number of neurons is also incorrect. This action would reduce the model's capacity, making it even simpler and thus exacerbating the high bias problem. The principle of Occam's razor suggests preferring a simpler model only when it achieves similar performance to a more complex one; it does not advocate for making an already underperforming model even simpler.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does 'high bias' mean in the context of machine learning models?
Open an interactive chat with Bash
Why does adding more hidden layers help address high bias in a neural network?
Open an interactive chat with Bash
What is hierarchical feature learning in deep neural networks?