Microsoft Fabric Data Engineer Associate DP-700 Practice Question
You manage a Microsoft Fabric lakehouse. In a Spark notebook you frequently join a 50-MB dimension table with a 2-TB fact table on a surrogate key. The Spark UI shows the join stage dominated by a costly shuffle hash join that increases latency. Without changing cluster size, storage format, or table partitioning, what action should you take to speed up this join?
Cache the fact DataFrame in memory before performing the join.
Use coalesce(1) on both DataFrames to force them into a single partition before the join.
Add a broadcast join hint to the dimension DataFrame so Spark replicates it across all executors before the join.
Increase the value of spark.sql.shuffle.partitions to create more shuffle partitions during the join.
A broadcast join hint tells Spark to copy the small dimension DataFrame to each executor, so the large fact table is streamed locally on every node and no shuffle of its data is required. Eliminating the shuffle greatly reduces network I/O and typically provides the biggest performance gain in classic star-schema joins where one side is much smaller. Increasing the shuffle partition count or forcing both DataFrames into a single partition does not remove the shuffle and can actually slow execution. Caching the 2-TB fact table consumes large amounts of memory and still requires shuffling, so it offers limited benefit compared to broadcasting the small table.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a shuffle hash join in Spark?
Open an interactive chat with Bash
What is a broadcast join in Spark?
Open an interactive chat with Bash
How does the broadcast join hint improve performance in Spark?
Open an interactive chat with Bash
Microsoft Fabric Data Engineer Associate DP-700
Monitor and optimize an analytics solution
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .