CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is evaluating several classifiers for a large-scale e-mail filtering project. The feature set is a 750 000 × 120 000 bag-of-words matrix stored in compressed-sparse-row (CSR) format with fewer than 1 % non-zero values. Training MultinomialNB and LinearSVC completes quickly and stays below 4 GB of RAM, but running GaussianNB on the same matrix causes the Python process to allocate more than 60 GB before the job is killed.

Which property of sparse-matrix handling in this scenario best explains why the GaussianNB run exhausts memory while the other two models do not?

  • GaussianNB requires integer word-count features, so it duplicates the sparse matrix as a separate float array before fitting.

  • GaussianNB computes an all-pairs Euclidean distance matrix and therefore materializes a full n × n distance table in memory.

  • GaussianNB implicitly converts the CSR matrix to a dense array in order to calculate feature means and variances, causing all zero entries to be stored explicitly.

  • GaussianNB applies kernel density estimation that adds synthetic features, dramatically increasing dimensionality when the input is sparse.

CompTIA DataX DY0-001 (V1)
Modeling, Analysis, and Outcomes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot