Microsoft Fabric Data Engineer Associate DP-700 Practice Question

You manage a Microsoft Fabric lakehouse that contains a Delta table named Transactions with 500 million rows added each month. For reporting, you must create a nightly process that produces a summary table with total sales amount and order count for every customer and month. The solution must use PySpark, minimize shuffle-related memory usage, and store the result as a managed Delta table for downstream queries. Which approach meets the requirements?

Run a Spark SQL statement that uses GROUPING SETS to aggregate the data, then write the output in Parquet format without partitions.
Use df.groupBy("customerId", "sale_month").agg(sum_("amount").alias("total_amount"), count("*").alias("order_count")).write.format("delta").partitionBy("customerId", "sale_month").mode("overwrite").saveAsTable("SalesMonthly")
Convert the table to an RDD, use map and reduceByKey to calculate sums and counts, then save the results as a single CSV file.
Call df.groupBy("customerId").pivot("sale_month").sum("amount").write.format("delta").mode("overwrite").saveAsTable("SalesMonthly") without partitioning.

Microsoft Fabric Data Engineer Associate DP-700

Ingest and transform data

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Answer Description

Ask Bash

What is PySpark, and why is it used in this solution?

Why is partitioning important in data storage, and how does partitionBy improve query performance?

What are Catalyst and Tungsten optimizations in Spark, and how do they enhance performance?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Report Issue

Answer Description

Ask Bash

What is PySpark, and why is it used in this solution?

Why is partitioning important in data storage, and how does partitionBy improve query performance?

What are Catalyst and Tungsten optimizations in Spark, and how do they enhance performance?

Report Issue