AWS Certified Data Engineer Associate DEA-C01 Practice Question
An AWS Glue Spark job transforms 100 GB of CSV data in Amazon S3 and loads it into an Amazon Redshift cluster through a JDBC connection. The run time increased from 15 to 45 minutes. CloudWatch shows Glue CPU under 30 percent, but Redshift commit-queue waits and a saturated WLM load queue. Without adding more DPUs or cluster nodes, which action will MOST reduce job duration?
Reduce the number of WLM query slots from 5 to 2 to give each slot more memory.
Write the transformed data to Amazon S3 in Parquet format and have the job invoke an Amazon Redshift COPY command to load the files.
Enable Amazon Redshift Concurrency Scaling for the WLM queue used by the job.
Double the Glue worker count and upgrade from G.1X to G.2X workers.
The JDBC connection issues one INSERT per Spark partition, creating thousands of small transactions that overwhelm the Redshift commit queue. Staging the transformed output as Parquet files on Amazon S3 and then running a COPY command performs a single, parallel bulk load, dramatically reducing commit overhead and queue contention. Increasing Glue workers only adds idle compute because Redshift is the bottleneck. Concurrency Scaling or changing WLM slot counts may help query concurrency, but they do not eliminate the large number of commits created by row-level INSERT statements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is using the Amazon Redshift COPY command more efficient than INSERT statements?
Open an interactive chat with Bash
What are the advantages of using Parquet over CSV for data transformation and storage?
Open an interactive chat with Bash
What is the role of Amazon S3 in staging transformed data before loading into Redshift?
Open an interactive chat with Bash
What is the difference between using a JDBC connection and a Redshift COPY command for loading data?
Open an interactive chat with Bash
Why is transforming data to Parquet format advantageous for Redshift processing?
Open an interactive chat with Bash
What is the WLM load queue in Redshift, and how does it impact performance?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .