CompTIA DataX DY0-001 (V1) Practice Question

During error analysis of a document-classification pipeline, you notice that the model assigns the same feature representation to the sentences "Please book a conference room" and "The book is overdue." The ambiguity arises because the bag-of-words/TF-IDF vectorizer ignores the syntactic role of the token book. To capture this distinction while keeping the representation compatible with a sparse document-term matrix, which preprocessing adjustment should you implement?

Remove all verbs from the corpus prior to vectorization to eliminate tokens that cause sense ambiguity.
Lemmatize all nouns and verbs and omit POS information; lemmatization alone resolves the ambiguity.
Lower-case every token and collapse repeated characters (e.g., soooo → so); the classifier will infer syntactic context automatically.
Append the POS tag to every token before TF-IDF vectorization (e.g., book_VB vs book_NN) so syntactically different usages become distinct features.

CompTIA DataX DY0-001 (V1)

Specialized Applications of Data Science

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What are POS tags and why are they important in NLP?

How does appending POS tags to tokens improve TF-IDF vectorization?

Why is lemmatization alone not sufficient to resolve sense ambiguity?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA DataX DY0-001 (V1) Practice Question

Report Issue

Answer Description

Ask Bash

What are POS tags and why are they important in NLP?

How does appending POS tags to tokens improve TF-IDF vectorization?

Why is lemmatization alone not sufficient to resolve sense ambiguity?

Report Issue