Microsoft Fabric Data Engineer Associate DP-700 Practice Question
You work in a Microsoft Fabric lakehouse. The Sales table has about 500 million rows, and the ProductSubcategory and ProductCategory tables each have fewer than 1 000 rows. You must build a daily Gold-layer table that denormalizes Sales with subcategory and category attributes while minimizing network shuffle and keeping the join in memory. Which Spark technique should you apply before running the joins?
Combine the three DataFrames with unionByName() and apply filters afterward.
Repartition the Sales DataFrame to a single partition, then perform the joins sequentially.
Disable Adaptive Query Execution so that Spark resorts to default shuffle hash joins.
Use the Spark broadcast() function (or BROADCAST join hint) on the two small lookup DataFrames before joining them to Sales.
Broadcasting very small lookup tables is a well-known Spark optimization. When you call broadcast() (or use the BROADCAST join hint) on ProductSubcategory and ProductCategory, the driver ships their data to every executor node. Each executor can then join its partition of the large Sales DataFrame locally, eliminating shuffle of the 500-million-row fact table. Repartitioning Sales to one partition forces single-threaded work, disabling AQE does not reduce shuffle, and unionByName() appends rows rather than joins.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Spark broadcast join?
Open an interactive chat with Bash
What is Adaptive Query Execution (AQE) in Spark?
Open an interactive chat with Bash
How does network shuffle affect performance in Spark?
Open an interactive chat with Bash
Microsoft Fabric Data Engineer Associate DP-700
Ingest and transform data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .