An engineering team implements dropout regularization in a feed-forward neural network using the "inverted" convention adopted by modern libraries. The dropout rate is set to 0.30 (each unit is dropped with probability 0.30). Which statement correctly describes what happens to the activations during training and inference under this convention?
Training: each unit is set to zero with probability 0.30 and the surviving activations are multiplied by 0.70; inference: no additional scaling is required because the network learns to compensate automatically.
Training: each unit is multiplied by Gaussian noise with mean 0.70 and variance 0.21; inference: activations are divided by 0.70 before being passed forward.
Training: each unit is set to zero with probability 0.30 with no scaling; inference: all activations are multiplied by 0.70 to compensate for the missing units.
Training: each unit is set to zero with probability 0.30 and the surviving activations are divided by 0.70; inference: no units are dropped and no extra scaling is applied.
With inverted dropout, the layer performs two distinct actions. During training it (1) independently masks 30% of the units and (2) rescales the surviving activations by 1/0.70 so that the expected activation distribution matches the one the network will see at inference time. When the model switches to inference, the dropout layer is effectively turned off-no units are dropped and no further scaling is applied-because the earlier training-time scaling already preserves the expected magnitude. Choices that postpone scaling to inference, shrink activations instead of amplifying them, or inject Gaussian noise describe variants that are not how standard inverted dropout operates.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of dropout regularization in neural networks?
Open an interactive chat with Bash
What does the 'inverted' dropout convention mean?
Open an interactive chat with Bash
Why is no scaling applied during inference with inverted dropout?