A data-science team is designing a neural network that must assign any combination of 30 risk flags to a transaction (for example, "velocity-spike" and "location-mismatch" can both be true for the same record). Down-stream systems will apply different probability thresholds to each flag, so the model's outputs have to be independent probabilities-one per flag. Which output-layer configuration and loss function best satisfies these requirements?
A single sigmoid-activated neuron trained with mean-squared-error loss
30 linear-activation neurons optimized with hinge loss
30 output neurons with softmax activation and categorical cross-entropy loss
30 output neurons, each with sigmoid activation, optimized with binary cross-entropy loss
Because more than one flag can be active at the same time, the task is multi-label rather than single-label multi-class. The output layer therefore needs one neuron per label so that each label is treated as an independent binary target. Using a sigmoid activation on each neuron converts every logit to a number in without forcing the 30 probabilities to sum to 1, and the binary cross-entropy loss optimizes each label as a separate Bernoulli outcome.
Softmax with categorical cross-entropy is intended for mutually exclusive classes and would force the probabilities to sum to 1, making it impossible to give high scores to several flags simultaneously. A single sigmoid output cannot model 30 independent labels, and linear or hinge configurations do not provide calibrated probabilities for binary outputs.
Therefore, the correct choice is 30 neurons with sigmoid activation trained with binary cross-entropy.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is sigmoid activation preferred for multi-label classification?
Open an interactive chat with Bash
What is binary cross-entropy loss, and why is it suitable here?
Open an interactive chat with Bash
Why is softmax activation with categorical cross-entropy incorrect for this scenario?