Microsoft Fabric Data Engineer Associate DP-700 Practice Question

You ingest daily customer order files into a lakehouse table. Each file may contain multiple versions of the same CustomerID and OrderDate combination, distinguished by a LastModified timestamp column. Using a PySpark DataFrame, you must keep only the most recent version of each record before loading to Silver. Which transformation should you implement?

Sort the DataFrame by LastModified descending and then call distinct().
Use groupBy("CustomerID", "OrderDate").agg({"*": "max"}) to collapse the rows.
Define a Window partitioned by ["CustomerID", "OrderDate"] ordered by col("LastModified").desc(), add row_number(), and filter where row_number() == 1.
Apply dropDuplicates(["CustomerID", "OrderDate"]) to the DataFrame.

Microsoft Fabric Data Engineer Associate DP-700

Ingest and transform data

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Answer Description

Ask Bash

What is a Spark Window function?

Why can't dropDuplicates() or distinct() be used in this situation?

What is the role of row_number() in PySpark?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Report Issue

Answer Description

Ask Bash

What is a Spark Window function?

Why can't dropDuplicates() or distinct() be used in this situation?

What is the role of row_number() in PySpark?

Report Issue