Microsoft Fabric Data Engineer Associate DP-700 Practice Question

You are developing a PySpark notebook in Microsoft Fabric that joins a 2-TB fact table to three dimension tables, each about 100 MB. Execution metrics show most time is spent on shuffle reads during the joins. Without resizing the Spark pool, you want the dimension tables broadcast to executors to cut shuffle time. Which Spark configuration should you set before running the notebook?

Increase the value of spark.sql.autoBroadcastJoinThreshold to 134217728 (128 MB).
Lower spark.sql.shuffle.partitions to 50 to reduce the number of shuffle partitions.
Set spark.sql.files.maxPartitionBytes to 134217728 bytes so that fewer input partitions are created.
Enable adaptive query execution by setting spark.sql.adaptive.enabled to true.

Microsoft Fabric Data Engineer Associate DP-700

Monitor and optimize an analytics solution

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Answer Description

Ask Bash

ELI5: What is spark.sql.autoBroadcastJoinThreshold?

Why does broadcasting reduce shuffle time in Spark joins?

What is the difference between `spark.sql.shuffle.partitions` and `spark.sql.autoBroadcastJoinThreshold`?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Report Issue

Answer Description

Ask Bash

ELI5: What is spark.sql.autoBroadcastJoinThreshold?

Why does broadcasting reduce shuffle time in Spark joins?

What is the difference between `spark.sql.shuffle.partitions` and `spark.sql.autoBroadcastJoinThreshold`?

Report Issue