CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is fine-tuning a large language model for a conversational AI application. They observe that while the generated responses are grammatically sound, they frequently converge on repetitive, generic phrases and lack creativity. The goal is to increase the diversity of the output without making it nonsensical. Which text generation decoding strategy should the data scientist implement to dynamically adjust the number of token choices at each step based on the cumulative probability distribution, effectively balancing coherence and creativity?

Top-K sampling
Nucleus (Top-P) sampling
Beam search
Greedy search

Report Issue

Answer Description

The correct answer is Nucleus (Top-P) sampling. This strategy is specifically designed to address the trade-off between coherence and diversity by dynamically selecting the token pool for sampling. It works by choosing the smallest set of tokens whose cumulative probability mass exceeds a predefined threshold 'p'. This allows the size of the sampling pool to adapt at each step: it becomes small when the model is certain about the next token (promoting coherence) and larger when the model is uncertain (promoting diversity).

Greedy search is incorrect because it is a deterministic method that always selects the single most probable next token. This approach is highly prone to producing repetitive and generic text, which is the problem the data scientist is trying to solve.
Beam search is also incorrect. While it improves upon greedy search by keeping track of multiple hypotheses (beams), it still fundamentally favors high-probability sequences and is known to struggle with generating diverse outputs, often producing safe and generic text.
Top-K sampling is a plausible but incorrect distractor. It introduces randomness by sampling from the 'k' most likely tokens, which does increase diversity. However, the number of choices 'k' is fixed. This is different from Nucleus sampling, which uses a dynamic set of tokens based on the probability distribution, as specified in the question.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

Why is Nucleus (Top-P) sampling preferred for balancing coherence and creativity?

Open an interactive chat with Bash

What is the difference between Top-K sampling and Nucleus (Top-P) sampling?

Open an interactive chat with Bash

How does Greedy search differ from Nucleus (Top-P) sampling in text generation?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Specialized Applications of Data Science

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

Why is Nucleus (Top-P) sampling preferred for balancing coherence and creativity?

What is the difference between Top-K sampling and Nucleus (Top-P) sampling?

How does Greedy search differ from Nucleus (Top-P) sampling in text generation?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams