CompTIA DataX DY0-001 (V1) Practice Question

Your team is preparing data for a churn-prediction model. A web-events table with 600 million rows must be linked to a 5 million-row CRM table so that behavioural features can be aggregated per customer. The only potentially shared attributes are 1) email_address - free-text that contains typos, mixed case and extra whitespace, and 2) phone - 10- to 14-digit numbers with inconsistent punctuation and optional country code. An exact inner join on both attributes retrieves only 72 % of the expected matches. The business requires at least 95 % linkage, and each linked pair must retain a confidence or similarity score for later audit. Memory is limited, so generating every possible record pair is not feasible.

Which data-wrangling approach best meets these requirements?

  • Lower-case and trim both attributes, hash each with SHA-256 to create a composite key, and join the two tables exactly on that hash value.

  • Standardise phone numbers, apply a phonetic or distance-based encoding (for example Soundex and Levenshtein) to the email local part, then perform a fuzzy join that outputs a similarity score column.

  • Remove all rows that have null values in either attribute and repeat an inner join on the cleaned columns without any further preprocessing.

  • One-hot encode the email domains, cluster the two tables with k-means, and cross-join records that fall into the same cluster to create candidate links.

CompTIA DataX DY0-001 (V1)
Operations and Processes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot