AWS Certified Data Engineer Associate DEA-C01 Practice Question
An AWS Glue job loads yesterday's records into an Amazon Redshift table named stage_txn. Each record has the columns txn_id (unique identifier) and updated_at (timestamp). The job must upsert data into the prod_txn table by keeping only the most recent version of each txn_id and removing any older duplicates, all within a single SQL statement that runs inside Redshift. Which solution satisfies these requirements and follows Redshift best practices?
Use a MERGE statement with a subquery that selects the latest row per txn_id, for example: MERGE INTO prod_txn AS p USING (SELECT * FROM stage_txn QUALIFY ROW_NUMBER() OVER (PARTITION BY txn_id ORDER BY updated_at DESC)=1) AS s ON p.txn_id = s.txn_id WHEN MATCHED THEN UPDATE SET ... WHEN NOT MATCHED THEN INSERT ...;
Create a view on stage_txn that filters duplicates with QUALIFY and schedule separate UPDATE and INSERT statements against prod_txn inside a transaction.
Run DELETE FROM prod_txn USING stage_txn WHERE prod_txn.txn_id = stage_txn.txn_id; then INSERT INTO prod_txn SELECT * FROM stage_txn;
Insert DISTINCT txn_id and MAX(updated_at) from stage_txn into a temporary table, join it back to stage_txn to find full rows, insert those into prod_txn, and run VACUUM DELETE on prod_txn afterwards.
Amazon Redshift supports the MERGE command, which can update or insert rows in one atomic statement. By feeding MERGE a filtered source sub-query that keeps only the newest row for each txn_id-using ROW_NUMBER with a PARTITION BY clause and the QUALIFY filter-you guarantee that exactly one version of each key is evaluated. The operation both updates existing rows and inserts new ones in a single statement, so no extra deletes, temp tables, or multi-step scripts are required. The other choices either need multiple statements, rely on manual clean-up, or cannot meet the single-statement requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of the QUALIFY clause in the MERGE statement?
Open an interactive chat with Bash
How does the ROW_NUMBER function work in this Redshift SQL query?
Open an interactive chat with Bash
What are the benefits of using the MERGE statement in Redshift for upsert operations?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .