Your data-science team is iterating on an image-classification pipeline that must run entirely on a battery-powered handheld scanner. The current ResNet-50 baseline delivers the required F1 score (0.84) but violates deployment constraints, taking about 350 ms per image and occupying more than 1 GB of RAM. The business requirement allows at most a 2-percentage-point drop in accuracy while meeting both latency (< 100 ms) and memory limits. Under the model architecture iteration phase of the design process, which next step is the most appropriate to satisfy the constraints?
Replace the ResNet-50 with a MobileNetV3-Large student model trained via knowledge distillation from the current ResNet-50 teacher.
Apply post-training 8-bit integer weight quantization to the trained ResNet-50 model without changing its structure.
Continue training the existing ResNet-50 using a cosine-annealing learning-rate schedule and a smaller batch size to improve convergence.
Augment the training set with additional labeled images for the most error-prone classes before retraining the ResNet-50.
Swapping the heavy ResNet-50 for a MobileNetV3-Large student that has been trained with knowledge distillation from the existing model directly changes the network architecture to one specifically designed for low-latency, low-memory edge inference. Distillation transfers most of the teacher's predictive power, so accuracy typically remains within a few percent while the lightweight architecture cuts parameters and inference time by several multiples. The other actions do not represent architecture iteration or are unlikely to achieve the required 3.5× speed-up and large memory savings: adjusting learning-rate schedules only tunes training hyperparameters, post-training 8-bit quantization keeps the architecture unchanged and seldom yields sufficient speed gains without accuracy loss, and adding more data addresses generalization rather than resource constraints.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is knowledge distillation in machine learning?
Open an interactive chat with Bash
Why is MobileNetV3-Large a good choice for low-latency applications?
Open an interactive chat with Bash
How does 8-bit integer quantization differ from replacing the model architecture?