CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is tasked with creating a unified customer view by merging two datasets:

  • The transactions table contains transaction_id and customer_email.
  • The profiles table contains profile_id (a surrogate primary key), full_name, and email.

The profile_id does not exist in the transactions table. A preliminary analysis shows that the email fields in both tables suffer from formatting inconsistencies, typos, and have a significant number of null values, making them unreliable as a sole identifier. Given this scenario, what is the most robust strategy for defining a key to merge these two tables?

  • Create a composite key for each dataset by first standardizing and then combining the customer_email and full_name fields before performing the join.

  • Generate a new surrogate key using a hash function on the transaction_id in the transactions table and the profile_id in the profiles table.

  • Perform a cross join (Cartesian product) between the two tables and then filter the results where the email fields are an exact match.

  • Use the email field as a natural key for the join after filtering out all records where the email is null from both datasets.

CompTIA DataX DY0-001 (V1)
Operations and Processes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot