AWS Certified Data Engineer Associate DEA-C01 Practice Question

A company stores raw clickstream logs in Amazon S3. A PySpark job converts each day's files to partitioned Parquet before analysts arrive. Daily input ranges from 20 GB to 2 TB. The team wants to minimize operational effort, pay only for compute actually used, and still finish processing within a 2-hour SLA. Which solution best meets these requirements?

Create an Amazon EMR Serverless Spark application and invoke the PySpark script with an AWS Step Functions workflow each morning.
Run the job on Amazon EMR on EKS, using Spot-backed worker node groups that are scaled by Cluster Autoscaler.
Deploy a persistent EMR cluster with On-Demand core nodes and enable cluster auto scaling; schedule the PySpark job with Apache Airflow running on the master node.
Create an AWS Glue Spark job with G.2X worker type and increase the number of DPUs until the job completes within the SLA.

AWS Certified Data Engineer Associate DEA-C01

Data Ingestion and Transformation

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

What is Amazon EMR Serverless, and how does it minimize operational effort?

Why is Amazon EMR Serverless better than AWS Glue for this use case?

What are the differences between EMR Serverless and a persistent EMR cluster?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

What is Amazon EMR Serverless, and how does it minimize operational effort?

Why is Amazon EMR Serverless better than AWS Glue for this use case?

What are the differences between EMR Serverless and a persistent EMR cluster?

Report Issue