Your team must deploy real-time semantic segmentation of high-resolution drone imagery on a low-power edge device that offers only 100 MB of GPU memory. The existing baseline is a standard U-Net with regular 3 × 3 convolutions, which exceeds the memory budget even after pruning. Fine-detail accuracy must be preserved and per-frame latency kept below 30 ms.
Which architectural change is MOST likely to meet the memory constraint without causing a large drop in segmentation accuracy?
Replace all standard convolutions in the encoder and decoder with depth-wise separable convolutions while retaining the original skip connections.
Keep the current network but add a fully connected conditional random field (CRF) module as a post-processing refinement stage.
Replace the encoder-decoder with a stacked hourglass network that keeps full-resolution feature maps across multiple stacked modules.
Switch to an FCN-32s architecture that performs a single 32× upsampling of the final feature map.
Replacing standard convolutions with depth-wise separable convolutions keeps the familiar encoder-decoder topology and its skip connections, so spatial detail is retained while the factorized filters cut parameters and activations by an order of magnitude-often > 20×-keeping the model well under the 100 MB limit.
A stacked hourglass network maintains full-resolution feature maps through multiple stacked modules; its large activation tensors and extra stacks raise memory far beyond the budget.
An FCN-32s upsamples only once at a stride of 32; although it is smaller, the coarse output loses boundary detail and typically scores much lower than encoder-decoder variants.
Adding a fully connected CRF after the decoder refines edges but does not shrink the network itself and introduces additional computation, so it fails to solve the memory problem.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are depth-wise separable convolutions, and why do they reduce memory usage?
Open an interactive chat with Bash
What advantages do skip connections bring in encoder-decoder architectures?
Open an interactive chat with Bash
Why is the stacked hourglass network not suitable for low-memory devices?