CompTIA DataX DY0-001 (V1) Practice Question

In the TF-IDF text-classification pipeline you are building for English-language restaurant reviews, the initial document-term matrix contains more than 150 000 unique tokens because words such as "run", "running", and "ran" are treated as separate features. You want to reduce this sparsity without accidentally conflating semantically different words like "universe" and "university". Which single text-preparation step best satisfies the requirement?

Remove all stop words, including verbs and adjectives, before vectorization.
Run the Porter stemming algorithm to strip suffixes from every token.
Apply part-of-speech-aware lemmatization to convert each token to its dictionary lemma.
Switch to character-level tokenization so each character becomes a feature.

CompTIA DataX DY0-001 (V1)

Specialized Applications of Data Science

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is part-of-speech-aware lemmatization?

Why is Porter stemming less effective in this case?

How would character-level tokenization affect the document-term matrix?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA DataX DY0-001 (V1) Practice Question

Report Issue

Answer Description

Ask Bash

What is part-of-speech-aware lemmatization?

Why is Porter stemming less effective in this case?

How would character-level tokenization affect the document-term matrix?

Report Issue