00:15:00

CompTIA DataX Practice Test (DY0-001)

Use the form below to configure your CompTIA DataX Practice Test (DY0-001). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

Logo for CompTIA DataX DY0-001 (V1)
Questions
Number of questions in the practice test
Free users are limited to 20 questions, upgrade to unlimited
Seconds Per Question
Determines how long you have to finish the practice test
Exam Objectives
Which exam objectives should be included in the practice test

CompTIA DataX DY0-001 (V1) Information

CompTIA DataX is an expert‑level, vendor‑neutral certification aimed at deeply experienced data science professionals. Launched on July 25, 2024, the exam verifies advanced competencies across the full data science lifecycle - from mathematical modeling and machine learning to deployment and specialized applications like NLP, computer vision, and anomaly detection.

The exam comprehensively covers five key domains:

  • Mathematics and Statistics (~17%)
  • Modeling, Analysis and Outcomes (~24%)
  • Machine Learning (~24%)
  • Operations and Processes (~22%)
  • Specialized Applications of Data Science (~13%)

It includes a mix of multiple‑choice and performance‑based questions (PBQs), simulating real-world tasks like interpreting data pipelines or optimizing machine learning workflows. The duration is 165 minutes, with a maximum of 90 questions. Scoring is pass/fail only, with no scaled score reported.

Free CompTIA DataX DY0-001 (V1) Practice Test

Press start when you are ready, or press Change to modify any settings for the practice test.

  • Questions: 15
  • Time: Unlimited
  • Included Topics:
    Mathematics and Statistics
    Modeling, Analysis, and Outcomes
    Machine Learning
    Operations and Processes
    Specialized Applications of Data Science
Question 1 of 15

In a churn-prediction initiative, your team builds a gradient-boosting model using 24 monthly snapshots (January 2023 - December 2024). Before the model can enter any online experiments, policy requires an offline validation step that (a) prevents temporal leakage and (b) ensures that every record is used for training at least once during hyper-parameter search. Which validation strategy best meets both requirements?

  • A single 80/20 hold-out split where the last five months are used only for testing and never included in training.

  • Leave-one-customer-out cross-validation that removes one customer's entire history per fold regardless of transaction dates.

  • Random k-fold cross-validation with shuffling enabled so each fold contains a mixture of months.

  • Walk-forward (expanding-window) time-series cross-validation that trains on the earliest months and validates on the next contiguous month, repeating until all folds are evaluated.

Question 2 of 15

A data scientist implements a multilayer perceptron with three hidden layers, but mistakenly sets every neuron's activation function to the identity mapping f(x)=x instead of a non-linear function such as ReLU. After training, the network behaves exactly like a single-layer linear regression, regardless of how many hidden units it contains. Which explanation best describes why the network loses expressive power in this situation?

  • Identity activations implicitly impose strong L2 regularization on the weights, preventing the model from fitting non-linear patterns.

  • Composing purely affine transformations (weights and bias) produces another affine transformation, so without a non-linear activation every layer collapses into one overall linear mapping of the inputs.

  • Using identity activations makes every weight matrix symmetric and rank-deficient, restricting the network to learn only linear relationships.

  • Identity activations force all bias terms to cancel during forward propagation, eliminating the offsets needed for non-linear decision boundaries.

Question 3 of 15

You are comparing response-time distributions for four successive firmware versions deployed across 8,000 IoT gateways. The measurements are right-skewed and clearly bimodal because some devices cache results while others do not. Management wants a single side-by-side visualization that (1) reveals the multimodal shape of each version's distribution, (2) highlights differences in medians and interquartile ranges, and (3) makes the thickness of the long upper tails easy to inspect. Which type of chart will satisfy all three requirements with the least additional annotation?

  • A stacked bar chart showing the count of observations in predefined latency buckets.

  • A faceted line plot of the cumulative distribution function (CDF) for each version.

  • A traditional box-and-whisker plot for each version without additional overlays.

  • A violin plot for each firmware version, sharing a common vertical response-time axis.

Question 4 of 15

A data scientist is analyzing latency data from hundreds of distributed microservices to ensure they meet service level objectives (SLOs). The dataset contains response times in milliseconds (a continuous variable) and the corresponding service ID (a categorical variable). The primary goal of the initial exploratory analysis is to efficiently compare the distributions of response times across all services, specifically to identify services with high variability and a significant number of extreme outlier response times. Which of the following visualizations is the most effective and scalable for this specific task?

  • A box and whisker plot.

  • A Q-Q plot comparing each service's response time distribution to a normal distribution.

  • A series of histograms, one for each service.

  • A scatter plot with service IDs on the x-axis and response times on the y-axis.

Question 5 of 15

A data science team is evaluating four association rules that have already met the project's minimum support and confidence thresholds:

  • Rule A: → support = 2%, confidence = 80%
  • Rule B: → support = 4%, confidence = 50%
  • Rule C: → support = 1%, confidence = 90%
  • Rule D: → support = 3%, confidence = 60%

To rank the rules, the team will use the reinforcement metric, also known as Rule Power Factor. Based on this metric, which rule is the most powerful?

  • Rule C

  • Rule A

  • Rule B

  • Rule D

Question 6 of 15

A financial services firm is developing an advanced AI assistant to help analysts review large volumes of legal contracts. The system must first interpret complex, free-form analyst queries, such as, "Summarize the key liabilities for all agreements with ACME Corp signed after 2022". After processing the request and extracting the relevant information from the documents, the system must then present its findings in a clear, coherent paragraph. Which two NLP applications are most representative of the core functions for interpreting the analyst's request and then generating the final output?

  • Speech Recognition and Speech Generation

  • Question-Answering and Sentiment Analysis

  • Natural Language Understanding (NLU) and Natural Language Generation (NLG)

  • Named-Entity Recognition (NER) and Text Summarization

Question 7 of 15

A public health agency is conducting a longitudinal study on the impact of a new manufacturing facility on community respiratory health over a 15-year period. The data science team is using administrative data from local clinics, which consists of patient records, diagnostic codes, and dates of service. Which of the following represents the most significant analytical challenge inherent to using this type of data for this specific study?

  • The procedural overhead of anonymizing personally identifiable information (PII) to comply with healthcare data regulations.

  • The high financial cost of licensing and integrating patient data from numerous independent healthcare providers.

  • Systematic shifts in data attributes resulting from changes in diagnostic criteria and data collection protocols over the 15-year period.

  • Selection bias resulting from the fact that the dataset only includes individuals from the community who have sought medical care.

Question 8 of 15

During a quarterly quality-control audit, an engineer randomly selects 15 memory modules from a warehouse of approximately 5 000 units without replacement and records how many are defective. She plans to model the count of defectives with a Binomial(15, p) distribution to build a confidence interval for the unknown defect rate. Which fundamental assumption required by the binomial model is most likely violated by this sampling design and, if ignored, will typically over-state the sampling variance?

  • Each trial outcome is independent of all other trials.

  • Every module can be classified into exactly two mutually exclusive states (defective or non-defective).

  • Both np and n(1 − p) must be at least 5 to justify a normal approximation.

  • The total number of trials is fixed in advance at 15.

Question 9 of 15

During the planning phase of a land-cover-classification project, a machine-learning engineer proposes re-using a ResNet-50 model that was originally trained on ImageNet (natural RGB photographs) as the starting point for a new classifier.

The new task involves hyperspectral satellite images containing 128 spectral bands whose visual characteristics differ greatly from natural photographs. Only about 1,000 labeled satellite images are available, GPU time is limited, and the team intends to freeze the early convolutional layers and fine-tune the remaining layers.

Which single factor in this scenario most strongly suggests that transfer learning from the ImageNet model is likely to harm rather than help model performance?

  • The plan to freeze the early convolutional layers and fine-tune only the later layers.

  • The large spectral and visual mismatch between the ImageNet source data and the hyperspectral satellite imagery.

  • The limited number of labeled satellite images (about 1,000).

  • The restricted GPU compute budget.

Question 10 of 15

Your data-science team runs its forecasting service in Kubernetes and exposes predictions through a REST endpoint /predict. You want to release updated model versions frequently while keeping latency below 50 ms for most requests. The release process must be able to:

  • direct only a small percentage of real-time traffic to the new version at first,
  • observe live accuracy and latency metrics before expanding use, and
  • roll back immediately if production quality degrades. Which deployment approach BEST satisfies these requirements and follows API-access best practices?
  • Use a blue-green deployment that replaces all production pods with the new version during a scheduled maintenance window.

  • Configure an API-gateway canary release that routes a small, weighted percentage of /predict calls to the new model version and adjusts the weight based on monitored metrics.

  • Mirror 100 % of live requests to the new model in a shadow deployment but discard its predictions so users never see them.

  • Update the existing /predict endpoint in-place and rely on automated container restarts to roll back if health checks fail.

Question 11 of 15

You are building a text-clustering workflow that starts with an extremely sparse 1 000 000 × 50 000 term-document matrix X. Because the matrix will not fit in memory when densified, constructing the covariance matrix XᵀX for a standard principal component analysis (PCA) is not an option. Instead, you choose to apply a truncated singular value decomposition (t-SVD) to reduce the dimensionality of X prior to clustering.

Which statement best explains why t-SVD is generally preferred over covariance-based PCA for this scenario?

  • t-SVD can be computed with iterative methods (e.g., randomized SVD or Lanczos) that multiply X by vectors without ever materializing XᵀX, allowing the decomposition to run efficiently on the sparse matrix.

  • t-SVD automatically scales every column of X to unit variance, eliminating the need for TF-IDF or other term-weighting schemes.

  • t-SVD guarantees that the resulting singular vectors are both orthogonal and sparse, making clusters easier to interpret than those obtained from PCA.

  • t-SVD forces all components of the lower-dimensional representation to be non-negative, so the projected features can be read as probabilities without any post-processing.

Question 12 of 15

During a model audit, you examine the first convolutional layer of an image-classification network. The layer receives a 128×128×3 input and applies 64 kernels of size 5×5 with stride 1 and "same" padding so that the spatial resolution of the output remains 128×128. Bias terms are present (one per kernel), but you must report only the number of trainable weights excluding biases in this layer. How many weights does the layer contain?

  • 9 600

  • 4 800

  • 78 643 200

  • 1 600

Question 13 of 15

You are building an anomaly-detection service for a wearable device that streams 3-D acceleration vectors x ∈ ℝ³. Because the sensor can be mounted in any orientation, the raw data may later be multiplied by an unknown orthonormal rotation matrix R before they reach your model. You need a distance function d(x, y) whose numerical value stays exactly the same when evaluated on the rotated vectors (Rx, Ry). Which of the following commonly used distance metrics fails to meet this rotation-invariance requirement and therefore should be avoided in this situation?

  • Cosine distance, 1 − cos θ

  • Gaussian radial basis distance D(x, y)=1−exp(−γ‖x−y‖²)

  • Euclidean (L2) distance

  • Manhattan (L1) distance

Question 14 of 15

You are developing a nearest-neighbor search over 15 000-dimensional TF-IDF vectors that vary greatly in total magnitude because some customers generate far more events than others. You want any two vectors that point in exactly the same direction-even if one is simply a scaled-up version of the other-to be treated as maximally similar (distance = 0). Which statement correctly explains why using cosine distance meets this requirement?

  • Cosine distance is computed as the sum of absolute component-wise differences, eliminating any dependence on vector length.

  • Cosine distance satisfies the triangle inequality, making it a proper metric that supports metric-tree indexing without modification.

  • After z-score standardization, cosine distance becomes algebraically identical to Euclidean distance, so either metric may be used interchangeably.

  • Multiplying either vector by any positive scalar leaves the cosine distance between the two vectors unchanged, so vectors that differ only in length are considered identical.

Question 15 of 15

A data scientist is vectorizing 10,000 technical-support tickets with scikit-learn's default TF-IDF configuration (raw term counts, smooth_idf=True, natural log).

For one ticket, the statistics below are observed:

  • The token "kernel" occurs 8 times in the ticket and appears in 30 different documents in the corpus.
  • The token "error" occurs 15 times in the ticket and appears in 9,000 documents in the corpus.
  • The token "segmentation" occurs 4 times in the ticket and appears in 120 documents in the corpus.

Using the TF-IDF formula
idf(t) = ln[(N + 1)/(df + 1)] + 1 and tf-idf(t, d) = tf(t, d) × idf(t),
where N = 10,000, which token receives the largest TF-IDF weight in this ticket?

  • error

  • kernel

  • segmentation

  • It cannot be determined without knowing the total number of terms in the ticket.