Microsoft Fabric Data Engineer Associate DP-700 Practice Question
You are developing a PySpark notebook in Microsoft Fabric that joins a 2-TB fact table to three dimension tables, each about 100 MB. Execution metrics show most time is spent on shuffle reads during the joins. Without resizing the Spark pool, you want the dimension tables broadcast to executors to cut shuffle time. Which Spark configuration should you set before running the notebook?
Increase the value of spark.sql.autoBroadcastJoinThreshold to 134217728 (128 MB).
Lower spark.sql.shuffle.partitions to 50 to reduce the number of shuffle partitions.
Set spark.sql.files.maxPartitionBytes to 134217728 bytes so that fewer input partitions are created.
Enable adaptive query execution by setting spark.sql.adaptive.enabled to true.
Spark can avoid expensive shuffle joins by broadcasting small tables to every executor, turning the operation into a more efficient map-side join. Spark decides to broadcast a table only if its size is below the value specified in the spark.sql.autoBroadcastJoinThreshold configuration setting, whose default is 10 MB. Because each dimension table is about 100 MB, you must raise this threshold-setting it to 134,217,728 bytes (128 MB) will allow the three 100 MB dimension tables to qualify for automatic broadcasting and eliminate the large shuffle. Adjusting spark.sql.shuffle.partitions merely changes the number of shuffle partitions and does not enforce broadcasting. Enabling adaptive query execution can improve some plans but will not override the broadcast threshold by itself. Changing spark.sql.files.maxPartitionBytes affects input file partitioning, not join strategy. Therefore, increasing spark.sql.autoBroadcastJoinThreshold is the correct action.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
ELI5: What is spark.sql.autoBroadcastJoinThreshold?
Open an interactive chat with Bash
Why does broadcasting reduce shuffle time in Spark joins?
Open an interactive chat with Bash
What is the difference between `spark.sql.shuffle.partitions` and `spark.sql.autoBroadcastJoinThreshold`?
Open an interactive chat with Bash
Microsoft Fabric Data Engineer Associate DP-700
Monitor and optimize an analytics solution
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .