Microsoft Fabric Data Engineer Associate DP-700 Practice Question

A Fabric lakehouse contains a Delta table named FactSales with 400 million rows. A Spark notebook joins this table to a lookup DataFrame that is created from a 5 000-row dimension table and then aggregates the result. The join currently runs for about 10-12 minutes and shows heavy shuffle activity in Spark UI. You cannot increase the session's compute resources. Which change will most likely reduce the job's runtime?

Apply the broadcast join hint to the dimension DataFrame before performing the join.
Repartition the FactSales DataFrame to 5 000 partitions with the repartition() method before the join.
Increase the value of spark.sql.shuffle.partitions from 200 to 10 000.
Convert both tables to CSV files and read them with spark.read.csv to avoid Delta overhead.

Microsoft Fabric Data Engineer Associate DP-700

Monitor and optimize an analytics solution

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Answer Description

Ask Bash

What is a Spark broadcast join?

What is shuffle activity in Spark UI?

Why does increasing spark.sql.shuffle.partitions not help here?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

Microsoft Fabric Data Engineer Associate DP-700 Practice Question

Report Issue

Answer Description

Ask Bash

What is a Spark broadcast join?

What is shuffle activity in Spark UI?

Why does increasing spark.sql.shuffle.partitions not help here?

Report Issue