Microsoft Fabric Data Engineer Associate DP-700 Practice Question

A data engineer schedules a Microsoft Fabric notebook from a Data Factory pipeline. The notebook loads 200 GB of JSON files into a Delta table by using the following code in the first cell:

(spark.read
      .option("multiline", "true")
      .json("/lake/raw/transactions")
      .write
      .format("delta")
      .mode("overwrite")
      .saveAsTable("finance.transactions_raw"))

When the pipeline runs, the notebook fails after several minutes. The run details page for the notebook activity shows the following error message:

java.lang.OutOfMemoryError: Java heap space

You need to modify the notebook so that the next execution succeeds without changing the pipeline definition or the underlying capacity. Which change should you make?

  • Add .coalesce(1) after spark.read to consolidate the data before writing.

  • Call repartition(200) on the DataFrame before writing it.

  • Persist the DataFrame with .persist() before writing it to the Delta table.

  • Insert spark.conf.set("spark.sql.shuffle.partitions", "8") at the start of the notebook.

Microsoft Fabric Data Engineer Associate DP-700
Monitor and optimize an analytics solution
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot