A machine learning engineer is designing a Multilayer Perceptron (MLP) for a multi-class classification problem with ten distinct, mutually exclusive categories. A critical requirement is that the network's output layer produces a vector of values that can be interpreted as a probability distribution over the ten classes. Which activation function should be implemented in the output layer to meet this requirement?
The correct option is Softmax. The Softmax function is specifically designed for multi-class classification tasks where an input must be assigned to one of several mutually exclusive classes. It converts a vector of raw, real-valued scores (logits) from the final layer into a probability distribution. The resulting values are all between 0 and 1 and sum to 1, directly representing the probability of the input belonging to each class.
Sigmoid: This function is incorrect because it is primarily used for binary classification, mapping a single output to a probability between 0 and 1. When used with multiple neurons for multi-label classification, it treats each class probability independently; the outputs do not sum to 1, which violates the requirement for a single probability distribution over mutually exclusive classes.
Rectified Linear Unit (ReLU): This function is incorrect for this use case. ReLU is the standard activation function for hidden layers in deep learning models, not the output layer for classification. Its output range is [0, ∞), which cannot be interpreted as a probability.
Hyperbolic Tangent (Tanh): This function is incorrect because its output is in the range of [-1, 1]. This is not suitable for representing probabilities, which must be non-negative. Tanh is typically used in the hidden layers of a neural network.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is the Softmax function suited for multi-class classification?
Open an interactive chat with Bash
What are key differences between Softmax and Sigmoid in machine learning?
Open an interactive chat with Bash
Why is ReLU not used in the output layer for classification tasks?