A data science team is building a convolutional neural network (CNN) to detect a rare type of retinal anomaly from a small and imbalanced dataset of fundus images. The initial model shows high accuracy on the training set but performs poorly on the validation set, indicating significant overfitting. The team has a limited budget and cannot collect more real-world data.
To improve the model's generalization and mitigate overfitting, which of the following data augmentation strategies would be the most effective and appropriate first step for this specific computer vision task?
Enriching the dataset by adding features extracted via geocoding the hospital location where each image was taken.
Implementing one-hot encoding on the image labels and then generating synthetic data using a Generative Adversarial Network (GAN).
Applying geometric transformations such as random rotations, flips, and zooms, combined with photometric distortions like adjustments to brightness and contrast.
Using Synthetic Minority Over-sampling Technique (SMOTE) directly on the flattened pixel values of the images.
The correct answer is to apply geometric and photometric transformations. In computer vision tasks with limited and imbalanced data, overfitting is a common problem. Data augmentation artificially expands the training dataset by creating modified versions of existing images. Geometric transformations (like rotations, flips, scaling) and photometric transformations (like adjusting brightness and contrast) are standard, computationally efficient, and highly effective first-line strategies. These techniques teach the model to be invariant to changes in orientation, size, and lighting conditions, which improves its ability to generalize to new, unseen images.
Using SMOTE directly on flattened pixel values is incorrect because SMOTE is designed for tabular, low-dimensional data, not for the high-dimensional, spatially-structured data of images. Applying it to raw pixels would create synthetically 'averaged' images that are likely medically nonsensical. While GANs can generate realistic synthetic images, they are complex, computationally expensive to train, and are typically considered a more advanced step, not an appropriate first choice. Geocoding is a data enrichment technique, not augmentation, and adding location data is irrelevant to the visual patterns of a retinal anomaly and would not help the CNN learn the classification task.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why are geometric and photometric transformations effective for CNNs in image-based tasks?
Open an interactive chat with Bash
Why is SMOTE unsuitable for image data in this context?
Open an interactive chat with Bash
How does a limited budget impact the data augmentation approach?