CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is tasked with creating a unified customer view by merging two datasets:

The transactions table contains transaction_id and customer_email.
The profiles table contains profile_id (a surrogate primary key), full_name, and email.

The profile_id does not exist in the transactions table. A preliminary analysis shows that the email fields in both tables suffer from formatting inconsistencies, typos, and have a significant number of null values, making them unreliable as a sole identifier. Given this scenario, what is the most robust strategy for defining a key to merge these two tables?

Generate a new surrogate key using a hash function on the transaction_id in the transactions table and the profile_id in the profiles table.
Perform a cross join (Cartesian product) between the two tables and then filter the results where the email fields are an exact match.
Use the email field as a natural key for the join after filtering out all records where the email is null from both datasets.
Create a composite key for each dataset by first standardizing and then combining the customer_email and full_name fields before performing the join.

CompTIA DataX DY0-001 (V1)

Operations and Processes

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

Why is using a composite key better than relying on a single field like email?

What does 'standardizing' fields like email and full_name mean in this context?

Why is using a cross join not a feasible option for merging these datasets?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA DataX DY0-001 (V1) Practice Question

Report Issue

Answer Description

Ask Bash

Why is using a composite key better than relying on a single field like email?

What does 'standardizing' fields like email and full_name mean in this context?

Why is using a cross join not a feasible option for merging these datasets?

Report Issue