AWS Certified Data Engineer Associate DEA-C01 Practice Question

A company runs an Amazon MSK cluster that receives thousands of sales events per second. The data engineering team must aggregate the events and write the results to Amazon S3 as compressed Apache Parquet files every 5 minutes. They want to use PySpark for the transformations, pay only for the compute they consume, and avoid managing any servers or long-running clusters. Which solution meets these requirements?

Configure AWS Lambda functions to be triggered by the MSK topic, aggregate records over 5-minute intervals, and write Parquet files to S3.
Create an AWS Glue streaming ETL job that reads from the MSK topic, sets a 5-minute micro-batch window, transforms the data with PySpark, and writes Parquet files to S3.
Build an Amazon Kinesis Data Analytics for Apache Flink application that consumes the MSK topic, performs the aggregations, and delivers the results to S3.
Launch an auto-terminating Amazon EMR cluster running Spark Streaming that polls the MSK topic and writes Parquet output to S3.

AWS Certified Data Engineer Associate DEA-C01

Data Ingestion and Transformation

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

What is the role of AWS Glue in data processing?

Why is PySpark preferred over Apache Flink in this scenario?

What are the limitations of using AWS Lambda for this use case?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

What is the role of AWS Glue in data processing?

Why is PySpark preferred over Apache Flink in this scenario?

What are the limitations of using AWS Lambda for this use case?

Report Issue