CompTIA DataX DY0-001 (V1) Practice Question

A data science team is preparing a large customer dataset to train a machine learning model for predicting fraudulent transactions. The dataset contains direct identifiers such as names and email addresses, as well as quasi-identifiers like ZIP codes and dates of birth. To adhere to strict data privacy regulations, the team must de-identify the data before analysis. Which of the following strategies provides the best balance between robustly protecting Personally Identifiable Information (PII) and preserving the analytical value of the features for the model?

  • Apply a character-masking function to all PII fields, replacing each character with a fixed symbol (e.g., 'X').

  • Completely remove all columns identified as direct and quasi-identifiers from the dataset.

  • Encrypt the entire dataset before loading it into the training environment and decrypt it just before model fitting.

  • Remove the direct identifiers and apply a consistent tokenization scheme to the quasi-identifiers.

CompTIA DataX DY0-001 (V1)
Operations and Processes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot