Microsoft Fabric Data Engineer Associate DP-700 Practice Question

You manage a Microsoft Fabric lakehouse that contains a Delta table named Transactions with 500 million rows added each month. For reporting, you must create a nightly process that produces a summary table with total sales amount and order count for every customer and month. The solution must use PySpark, minimize shuffle-related memory usage, and store the result as a managed Delta table for downstream queries. Which approach meets the requirements?

  • Run a Spark SQL statement that uses GROUPING SETS to aggregate the data, then write the output in Parquet format without partitions.

  • Use df.groupBy("customerId", "sale_month").agg(sum_("amount").alias("total_amount"), count("*").alias("order_count")).write.format("delta").partitionBy("customerId", "sale_month").mode("overwrite").saveAsTable("SalesMonthly")

  • Convert the table to an RDD, use map and reduceByKey to calculate sums and counts, then save the results as a single CSV file.

  • Call df.groupBy("customerId").pivot("sale_month").sum("amount").write.format("delta").mode("overwrite").saveAsTable("SalesMonthly") without partitioning.

Microsoft Fabric Data Engineer Associate DP-700
Ingest and transform data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot