Microsoft Fabric Data Engineer Associate DP-700 Practice Question
A Fabric lakehouse contains a Delta table named FactSales with 400 million rows. A Spark notebook joins this table to a lookup DataFrame that is created from a 5 000-row dimension table and then aggregates the result. The join currently runs for about 10-12 minutes and shows heavy shuffle activity in Spark UI. You cannot increase the session's compute resources. Which change will most likely reduce the job's runtime?
Apply the broadcast join hint to the dimension DataFrame before performing the join.
Repartition the FactSales DataFrame to 5 000 partitions with the repartition() method before the join.
Increase the value of spark.sql.shuffle.partitions from 200 to 10 000.
Convert both tables to CSV files and read them with spark.read.csv to avoid Delta overhead.
Broadcasting a very small table (such as the 5 000-row dimension table) lets Spark send a copy of that table to every executor, eliminating shuffles during the join. With a broadcast hash join, each executor can join its partition of the large FactSales table to the in-memory broadcasted lookup table locally, greatly reducing network I/O and execution time. Increasing the number of shuffle partitions adds task overhead without removing the shuffle. Repartitioning the large fact table triggers an additional full shuffle and still does not avoid the expensive join shuffle. Converting Delta files to CSV sacrifices ACID benefits while doing nothing to address the shuffle bottleneck. Therefore, applying the broadcast join hint to the dimension DataFrame is the most effective optimization here.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Spark broadcast join?
Open an interactive chat with Bash
What is shuffle activity in Spark UI?
Open an interactive chat with Bash
Why does increasing spark.sql.shuffle.partitions not help here?
Open an interactive chat with Bash
Microsoft Fabric Data Engineer Associate DP-700
Monitor and optimize an analytics solution
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .