CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is vectorizing 10,000 technical-support tickets with scikit-learn's default TF-IDF configuration (raw term counts, smooth_idf=True, natural log).

For one ticket, the statistics below are observed:

  • The token "kernel" occurs 8 times in the ticket and appears in 30 different documents in the corpus.
  • The token "error" occurs 15 times in the ticket and appears in 9,000 documents in the corpus.
  • The token "segmentation" occurs 4 times in the ticket and appears in 120 documents in the corpus.

Using the TF-IDF formula
idf(t) = ln[(N + 1)/(df + 1)] + 1 and tf-idf(t, d) = tf(t, d) × idf(t),
where N = 10,000, which token receives the largest TF-IDF weight in this ticket?

  • segmentation

  • error

  • It cannot be determined without knowing the total number of terms in the ticket.

  • kernel

CompTIA DataX DY0-001 (V1)
Specialized Applications of Data Science
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot