CompTIA DataX DY0-001 (V1) Practice Question

A machine learning engineer is upgrading a natural language processing pipeline that uses an RNN-based architecture for machine translation. The existing model struggles with long-term dependencies in lengthy sentences and faces slow training times due to its sequential nature. To address these issues, the engineer decides to implement a Transformer model. Which core component of the Transformer architecture directly addresses both the challenge of capturing long-range dependencies and the bottleneck of sequential processing?

Positional encodings.
The encoder-decoder stack.
The self-attention mechanism.
Residual connections and layer normalization.

Report Issue

Answer Description

The correct answer is the self-attention mechanism. This mechanism is the fundamental innovation of the Transformer architecture. It allows the model to weigh the importance of all other words in the input sequence when encoding a specific word. By creating direct paths between any two tokens in the sequence, regardless of their distance, it effectively captures long-range dependencies. This parallel processing of all tokens at once overcomes the sequential bottleneck inherent in RNNs, where information and gradients must pass through many intermediate steps, leading to slow training and the vanishing gradient problem.

Positional encodings are incorrect because their purpose is to inject information about the sequence order of the tokens, which the self-attention mechanism itself does not capture. They are necessary for the model to understand word order but do not directly model the dependencies.
The encoder-decoder stack is a high-level structure that also exists in many RNN-based models for sequence-to-sequence tasks. Therefore, it is not the unique component in Transformers that solves the stated problems. The innovation lies in how the encoder and decoder are constructed (i.e., with self-attention layers), not the stack itself.
Residual connections and layer normalization are general deep learning techniques used to stabilize training and improve gradient flow in deep networks, including Transformers. They help mitigate the vanishing gradient problem in the context of network depth but do not solve the long-range dependency issue caused by sequential data processing in RNNs.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

What is the self-attention mechanism in a Transformer model?

Open an interactive chat with Bash

How do positional encodings work in a Transformer model?

Open an interactive chat with Bash

Why are RNNs slow at handling long sequences compared to Transformers?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Machine Learning

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is the self-attention mechanism in a Transformer model?

How do positional encodings work in a Transformer model?

Why are RNNs slow at handling long sequences compared to Transformers?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams