AWS Certified Data Engineer Associate DEA-C01 Practice Question
An Amazon EMR cluster running Hive joins a 4 TB sales_fact table with a 10-row region_dim table on region_id. One region_id accounts for 70 % of the fact rows, so the reducer handling that key takes hours longer than the others. Which change will best mitigate the data skew while requiring the smallest modification to the existing Hive query?
Convert the join to a map-side broadcast by adding the MAPJOIN hint (or enabling hive.auto.convert.join) so region_dim is replicated to every mapper.
Bucket the sales_fact table by region_id and force a bucket map join.
Enable speculative execution for reducers to launch duplicate tasks on slow reducers.
Increase hive.exec.reducers.max to create more reducers for the job.
Broadcasting (map-side) the tiny region_dim table removes the need for a shuffle and reducer phase. Each mapper receives an in-memory copy of the 10-row dimension and performs the join locally, so no single task is responsible for the skewed key. Raising the reducer count does not help because the skewed key still hashes to one reducer. Bucketing by region_id simply moves the skew to one large bucket. Speculative execution may run duplicate copies of the slow reducer but cannot eliminate the underlying imbalance. Therefore, converting the join to a map-side broadcast is the most effective, low-impact solution.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a map-side broadcast join in Hive?
Open an interactive chat with Bash
Why does increasing the number of reducers not solve data skew?
Open an interactive chat with Bash
What is speculative execution in Hadoop, and why doesn't it help with skew?
Open an interactive chat with Bash
What is a map-side join in Hive?
Open an interactive chat with Bash
What causes data skew in Hive, and why is it problematic?
Open an interactive chat with Bash
How does enabling hive.auto.convert.join improve query performance?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .