Microsoft Fabric Data Engineer Associate DP-700 Practice Question

You manage a Microsoft Fabric lakehouse that contains a Delta table named Transactions with 500 million rows added each month. For reporting, you must create a nightly process that produces a summary table with total sales amount and order count for every customer and month. The solution must use PySpark, minimize shuffle-related memory usage, and store the result as a managed Delta table for downstream queries. Which approach meets the requirements?

  • Call df.groupBy("customerId").pivot("sale_month").sum("amount").write.format("delta").mode("overwrite").saveAsTable("SalesMonthly") without partitioning.

  • Convert the table to an RDD, use map and reduceByKey to calculate sums and counts, then save the results as a single CSV file.

  • Use df.groupBy("customerId", "sale_month").agg(sum_("amount").alias("total_amount"), count("*").alias("order_count")).write.format("delta").partitionBy("customerId", "sale_month").mode("overwrite").saveAsTable("SalesMonthly")

  • Run a Spark SQL statement that uses GROUPING SETS to aggregate the data, then write the output in Parquet format without partitions.

Microsoft Fabric Data Engineer Associate DP-700
Ingest and transform data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot