CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is developing a text classification model using a large corpus of over one million documents. They have generated TF-IDF feature vectors, resulting in a document-term matrix with more than 200,000 unique terms (features). When training a k-Nearest Neighbors (k-NN) classifier on these high-dimensional, sparse vectors, they observe two primary issues: extremely long training times and poor predictive accuracy. Which of the following strategies provides the most effective solution to address both the computational inefficiency and the model performance problem?

  • Augment the feature set by including bigrams and trigrams from the text corpus.

  • Convert the TF-IDF matrix into a Compressed Sparse Row (CSR) format.

  • Standardize the feature vectors using a StandardScaler to have zero mean and unit variance.

  • Apply Truncated SVD to the feature matrix to reduce its dimensionality.

CompTIA DataX DY0-001 (V1)
Modeling, Analysis, and Outcomes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot