An engineer must simplify a convolutional neural network (CNN) that will run on a memory-constrained embedded device. The goal is to reduce the model's parameter count and lower the risk of overfitting without discarding the channel-wise information learned by the last convolutional block. Which pooling layer inserted directly after the final convolution best satisfies these requirements and why?
Max pooling with a 2×2 window, because it keeps only the strongest activation in each local region of the feature map.
Stochastic pooling, because it randomly samples activations from each pooling window to introduce regularization.
Global average pooling, because it converts each feature map into a single value, eliminating the need for large fully connected layers.
Fractional max pooling, because it downsamples using non-integer strides to preserve more spatial information than standard max pooling.
Global average pooling (GAP) computes the mean of every entire feature map, producing a single scalar per channel. Because this operation contains no trainable weights, it can replace large, dense, fully connected layers that would otherwise follow the convolutional stack. Doing so removes millions of parameters and therefore cuts memory usage and the chance of overfitting. Max pooling, stochastic pooling, and fractional max pooling also have no trainable parameters, but they perform local down-sampling rather than collapsing each map to one value, so they cannot serve as a drop-in replacement for a classifier's fully connected head and do not yield the same parameter savings.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Global Average Pooling and why is it useful?
Open an interactive chat with Bash
How does Global Average Pooling prevent overfitting?
Open an interactive chat with Bash
Why wouldn’t max pooling work as a substitute for fully connected layers?