Microsoft Fabric Data Engineer Associate DP-700 Practice Question

A data engineer schedules a Microsoft Fabric notebook from a Data Factory pipeline. The notebook loads 200 GB of JSON files into a Delta table by using the following code in the first cell:

(spark.read
      .option("multiline", "true")
      .json("/lake/raw/transactions")
      .write
      .format("delta")
      .mode("overwrite")
      .saveAsTable("finance.transactions_raw"))

When the pipeline runs, the notebook fails after several minutes. The run details page for the notebook activity shows the following error message:

java.lang.OutOfMemoryError: Java heap space

You need to modify the notebook so that the next execution succeeds without changing the pipeline definition or the underlying capacity. Which change should you make?

Add .coalesce(1) after spark.read to consolidate the data before writing.
Call repartition(200) on the DataFrame before writing it.
Persist the DataFrame with .persist() before writing it to the Delta table.
Insert spark.conf.set("spark.sql.shuffle.partitions", "8") at the start of the notebook.

Microsoft Fabric Data Engineer Associate DP-700

Monitor and optimize an analytics solution

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Answer Description

Ask Bash

What is a DataFrame in Spark?

Why does repartitioning help reduce memory issues in Spark?

What is the difference between `repartition()` and `coalesce()` in Spark?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Report Issue

Answer Description

Ask Bash

What is a DataFrame in Spark?

Why does repartitioning help reduce memory issues in Spark?

What is the difference between `repartition()` and `coalesce()` in Spark?

Report Issue