CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is performing topic modeling on a corpus of several hundred thousand financial reports. They construct a document-term matrix (DTM) as the initial feature set. Due to the large and specialized vocabulary, the resulting DTM is extremely high-dimensional and sparse. This leads to the "curse of dimensionality", which presents a significant challenge for subsequent analysis. Which of the following statements BEST describes a primary consequence of this issue and a standard method to address it?

  • The high dimensionality causes distance metrics to become less meaningful, hampering the performance of clustering and classification algorithms. This can be mitigated by applying dimensionality reduction techniques like Singular Value Decomposition (SVD).

  • The sparsity of the matrix guarantees that any machine learning model trained on it will be underfit. The primary solution is to use a more complex model, such as a deep neural network, to capture the sparse features.

  • The primary issue is the loss of semantic relationships between words, such as synonymy. This is addressed by applying TF-IDF weighting to the DTM before modeling.

  • The computational cost of creating the DTM itself is the main bottleneck. This is best solved by implementing a more efficient tokenization algorithm and using hash vectorization.

CompTIA DataX DY0-001 (V1)
Specialized Applications of Data Science
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot