You are preparing a training pipeline for an edge-deployed object-detector that must accept 1024 × 1024 inputs, but your annotated UAV dataset consists of 2048 × 2048 images containing objects whose sizes vary by two orders of magnitude. The goal is to expose the model to a wide range of object scales without distorting object geometry or breaking the correspondence between images and bounding-box labels. Which augmentation pipeline best achieves this objective?
Independently rescale width and height by random factors between 0.5 and 2.0 while keeping the 1024 × 1024 canvas fixed, leaving the original bounding-box coordinates unchanged.
Down-sample every image once to 1024 × 1024 with nearest-neighbor interpolation and discard any images whose dimensions are not divisible by two.
Apply an isotropic random scale in the range 0.5-2.0 to each image, then center-crop or resize to 1024 × 1024 and update every bounding-box coordinate by the same scale factor.
Leave the image size untouched but apply random 90-degree rotations followed by brightness/contrast jitter and horizontal flips.
Applying a single, random isotropic scale factor to the entire image (for example between 0.5 × and 2 ×) enlarges or shrinks every pixel uniformly. Because the same affine matrix is applied to the annotation data, each bounding box is still tightly aligned with its object. Cropping or resizing the result back to the network's fixed 1024 × 1024 input preserves aspect ratio while delivering objects of many apparent sizes, improving scale invariance. In contrast, independently scaling width and height (anisotropic) warps objects and, if the boxes are not recomputed, leaves them misplaced. A one-time down-sample removes scale diversity and nearest-neighbor blurs small targets. Rotations and color jitter add useful variation but do not change object scale, so they do not satisfy the stated requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is isotropic scaling important for object detection?
Open an interactive chat with Bash
What is the difference between isotropic and anisotropic scaling?
Open an interactive chat with Bash
How does adjusting the scale improve scale invariance in object detection models?