Microsoft Fabric Data Engineer Associate DP-700 Practice Question
A data engineer schedules a Microsoft Fabric notebook from a Data Factory pipeline. The notebook loads 200 GB of JSON files into a Delta table by using the following code in the first cell:
When the pipeline runs, the notebook fails after several minutes. The run details page for the notebook activity shows the following error message:
java.lang.OutOfMemoryError: Java heap space
You need to modify the notebook so that the next execution succeeds without changing the pipeline definition or the underlying capacity. Which change should you make?
Add .coalesce(1) after spark.read to consolidate the data before writing.
Call repartition(200) on the DataFrame before writing it.
Persist the DataFrame with .persist() before writing it to the Delta table.
Insert spark.conf.set("spark.sql.shuffle.partitions", "8") at the start of the notebook.
The error is thrown because the driver's JVM runs out of memory while reading the entire 200 GB JSON dataset into a single DataFrame before writing. Repartitioning the input immediately after reading breaks the job into many smaller tasks, which reduces the amount of memory each executor and the driver must hold at one time. Persisting or caching the DataFrame would increase, not decrease, memory pressure. Using coalesce(1) moves all data to one partition and makes the memory problem worse. Reducing the shuffle partition setting affects the shuffle stage but does nothing for the initial read that is causing the heap-space failure.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a DataFrame in Spark?
Open an interactive chat with Bash
Why does repartitioning help reduce memory issues in Spark?
Open an interactive chat with Bash
What is the difference between `repartition()` and `coalesce()` in Spark?
Open an interactive chat with Bash
Microsoft Fabric Data Engineer Associate DP-700
Monitor and optimize an analytics solution
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .