AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company stores raw clickstream logs in Amazon S3. A PySpark job converts each day's files to partitioned Parquet before analysts arrive. Daily input ranges from 20 GB to 2 TB. The team wants to minimize operational effort, pay only for compute actually used, and still finish processing within a 2-hour SLA. Which solution best meets these requirements?
Create an Amazon EMR Serverless Spark application and invoke the PySpark script with an AWS Step Functions workflow each morning.
Create an AWS Glue Spark job with G.2X worker type and increase the number of DPUs until the job completes within the SLA.
Run the job on Amazon EMR on EKS, using Spot-backed worker node groups that are scaled by Cluster Autoscaler.
Deploy a persistent EMR cluster with On-Demand core nodes and enable cluster auto scaling; schedule the PySpark job with Apache Airflow running on the master node.
Amazon EMR Serverless automatically adds and removes Spark workers during the job and bills only for vCPU-seconds and memory-seconds consumed, so there is no cluster to provision or pay for when idle. EMR Serverless can burst well beyond the 100-DPU (or 299 worker) limit imposed by AWS Glue Spark jobs, allowing the 2 TB conversion to complete within the 2-hour SLA without manual tuning. By contrast, a long-running EMR cluster or EMR on EKS would require maintaining underlying EC2 or EKS infrastructure, and AWS Glue jobs would need careful worker-count adjustments and may hit service limits, making them less cost-efficient for highly variable workloads.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Amazon EMR Serverless, and how does it minimize operational effort?
Open an interactive chat with Bash
Why is Amazon EMR Serverless better than AWS Glue for this use case?
Open an interactive chat with Bash
What are the differences between EMR Serverless and a persistent EMR cluster?
Open an interactive chat with Bash
What is Amazon EMR Serverless?
Open an interactive chat with Bash
What is Parquet and why is it used in data processing?
Open an interactive chat with Bash
Why is EMR Serverless preferred over AWS Glue for this workload?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .