CompTIA DataX DY0-001 (V1) Practice Question

During error analysis of a document-classification pipeline, you notice that the model assigns the same feature representation to the sentences "Please book a conference room" and "The book is overdue." The ambiguity arises because the bag-of-words/TF-IDF vectorizer ignores the syntactic role of the token book. To capture this distinction while keeping the representation compatible with a sparse document-term matrix, which preprocessing adjustment should you implement?

  • Remove all verbs from the corpus prior to vectorization to eliminate tokens that cause sense ambiguity.

  • Lemmatize all nouns and verbs and omit POS information; lemmatization alone resolves the ambiguity.

  • Lower-case every token and collapse repeated characters (e.g., soooo → so); the classifier will infer syntactic context automatically.

  • Append the POS tag to every token before TF-IDF vectorization (e.g., book_VB vs book_NN) so syntactically different usages become distinct features.

CompTIA DataX DY0-001 (V1)
Specialized Applications of Data Science
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot