During a model‐design iteration you need to tune eight mixed hyperparameters for a multilingual transformer-based text-classification model. One full training run takes about 12 GPU-hours, but a quick look at experiments from earlier sprints shows that F1 measured after the first epoch strongly predicts the final F1. Your team has an overall budget of 48 GPU-hours for this tuning cycle and wants the single best F1 score achievable within that limit. Which hyperparameter-search strategy is the MOST appropriate for these constraints?
Perform a random search in which each randomly selected configuration is trained for the full 12 GPU-hours.
Apply Gaussian-process Bayesian optimization, training each proposed configuration for the maximum number of epochs.
Use Hyperband with successive halving so each trial starts with one epoch and additional epochs are allocated only to the best-performing configurations.
Run an exhaustive grid search that trains every hyperparameter combination to full convergence.
Hyperband begins by allocating a very small resource budget (for example, one epoch) to a large number of randomly sampled hyperparameter configurations and then repeatedly applies successive halving, pruning low-performing trials and increasing the budget only for the top performers. Because it stops unpromising runs early, it can explore far more configurations than grid or plain random search under the same compute limit and often delivers an order-of-magnitude speed-up compared with Bayesian or exhaustive methods when training time dominates cost. In contrast, exhaustive grid search and single-pass random search would require every configuration to train the full 12 GPU-hours, quickly exhausting the 48-hour budget. A classical Gaussian-process Bayesian optimization loop typically evaluates each suggested configuration to convergence and therefore cannot exploit the strong first-epoch signal unless augmented with multi-fidelity extensions. Therefore, Hyperband with successive halving most directly satisfies the time-and-resource constraint while maximizing the chance of finding a high-F1 model.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Hyperband and how does it work?
Open an interactive chat with Bash
Why is successive halving important in Hyperband?
Open an interactive chat with Bash
How does Hyperband compare to other hyperparameter tuning methods?