Microsoft Fabric Data Engineer Associate DP-700 Practice Question
You ingest daily customer order files into a lakehouse table. Each file may contain multiple versions of the same CustomerID and OrderDate combination, distinguished by a LastModified timestamp column. Using a PySpark DataFrame, you must keep only the most recent version of each record before loading to Silver. Which transformation should you implement?
Sort the DataFrame by LastModified descending and then call distinct().
Use groupBy("CustomerID", "OrderDate").agg({"*": "max"}) to collapse the rows.
Define a Window partitioned by ["CustomerID", "OrderDate"] ordered by col("LastModified").desc(), add row_number(), and filter where row_number() == 1.
Apply dropDuplicates(["CustomerID", "OrderDate"]) to the DataFrame.
The reliable way to retain the newest record per composite key is to apply a Window specification that partitions the DataFrame by CustomerID and OrderDate, orders each partition by LastModified descending, assigns a row_number, and filters for the first row. dropDuplicates keeps the first physical occurrence, which is not guaranteed to be the latest. distinct removes exact duplicate rows but cannot evaluate timestamps. Aggregating with max returns only scalar values, losing the remainder of each row's columns. Window functions therefore satisfy both correctness and completeness requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Spark Window function?
Open an interactive chat with Bash
Why can't dropDuplicates() or distinct() be used in this situation?
Open an interactive chat with Bash
What is the role of row_number() in PySpark?
Open an interactive chat with Bash
Microsoft Fabric Data Engineer Associate DP-700
Ingest and transform data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .