CompTIA DataX DY0-001 (V1) Practice Question

During an early design iteration, your team is fine-tuning a 250-million-parameter Transformer on a single 24 GB GPU. When you raise the mini-batch size from 16 to 64, training fails with an out-of-memory (OOM) error, and the budget does not allow additional hardware. You have one day to rerun the experiment and want to keep the architecture and hyperparameter search results unchanged. Which change to the training configuration is the most appropriate way to satisfy the resource constraint while minimizing impact on model accuracy and development time?

Pad every input sequence to exactly 512 tokens so tensor shapes are consistent across batches.
Enable mixed-precision (FP16/bfloat16) training with automatic loss scaling.
Replace the AdamW optimizer with standard SGD without momentum to eliminate optimizer state.
Double the model's hidden dimension but freeze all even-numbered layers to reduce gradient updates.

CompTIA DataX DY0-001 (V1)

Modeling, Analysis, and Outcomes

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is mixed-precision (FP16/bfloat16) training?

Why does padding input sequences to 512 tokens increase memory usage?

What is automatic loss scaling, and why is it important in mixed-precision training?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA DataX DY0-001 (V1) Practice Question

Report Issue

Answer Description

Ask Bash

What is mixed-precision (FP16/bfloat16) training?

Why does padding input sequences to 512 tokens increase memory usage?

What is automatic loss scaling, and why is it important in mixed-precision training?

Report Issue