Microsoft Fabric Data Engineer Associate DP-700 Practice Question
A nightly Microsoft Fabric pipeline loads a Parquet file to a bronze folder and then upserts data into a silver Delta Lake table named Customers. The file may repeat customer_id values because of late updates or replayed files. You need the silver table to keep only the newest updated_at row per customer_id and allow safe re-runs without new duplicates. Which approach should you use?
A Spark notebook that reads the file, writes it to the Customers Delta table in append mode, and then runs OPTIMIZE ZORDER BY (updated_at).
A Spark notebook that executes a Delta Lake MERGE INTO Customers USING the nightly DataFrame ON customer_id, updating the row only when the incoming updated_at value is greater and inserting otherwise.
A Spark notebook that calls dropDuplicates("customer_id") on the DataFrame and overwrites the Customers table on each load.
A Data Factory copy activity that writes the file to the lakehouse with the preserveHierarchy option set to true and skipDuplicates enabled.
Using a Delta Lake MERGE operation lets you perform an idempotent upsert. When the pipeline runs, Spark uses customer_id as the match key. If a customer_id already exists, the WHEN MATCHED clause compares updated_at values and updates the row only when the incoming record is newer; otherwise it ignores the duplicate. If the customer_id is not found, the row is inserted. Because MERGE is transactional, rerunning the pipeline with the same file does not add duplicates.
DropDuplicates keeps the first occurrence it encounters and cannot guarantee the newest row when late updates arrive. Writing with mode("overwrite") replaces the whole table and can cause data loss. OPTIMIZE and ZORDER improve query performance but do not remove duplicates. Data Factory copy activity options such as preserveHierarchy or skipDuplicates do not exist for writing to a lakehouse Delta table and cannot evaluate updated_at timestamps.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Delta Lake MERGE?
Open an interactive chat with Bash
Why is idempotency important in data pipelines?
Open an interactive chat with Bash
How does ZORDER BY help in Delta Lake?
Open an interactive chat with Bash
Microsoft Fabric Data Engineer Associate DP-700
Ingest and transform data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .