AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company runs an Amazon MSK cluster that receives thousands of sales events per second. The data engineering team must aggregate the events and write the results to Amazon S3 as compressed Apache Parquet files every 5 minutes. They want to use PySpark for the transformations, pay only for the compute they consume, and avoid managing any servers or long-running clusters. Which solution meets these requirements?
Launch an auto-terminating Amazon EMR cluster running Spark Streaming that polls the MSK topic and writes Parquet output to S3.
Configure AWS Lambda functions to be triggered by the MSK topic, aggregate records over 5-minute intervals, and write Parquet files to S3.
Build an Amazon Kinesis Data Analytics for Apache Flink application that consumes the MSK topic, performs the aggregations, and delivers the results to S3.
Create an AWS Glue streaming ETL job that reads from the MSK topic, sets a 5-minute micro-batch window, transforms the data with PySpark, and writes Parquet files to S3.
AWS Glue streaming ETL jobs are serverless Spark applications that can read directly from Amazon MSK topics, process data with PySpark, and micro-batch the output on a user-defined interval (for example, 5 minutes). Jobs charge only for the DPU seconds actually used. Amazon EMR requires provisioning and managing a cluster and accrues costs while the cluster is running, even if it terminates frequently. Kinesis Data Analytics for Apache Flink uses Flink, not PySpark, so it does not satisfy the language requirement. AWS Lambda can subscribe to MSK, but coordinating windowed aggregations and writing Parquet at scale would exceed typical Lambda memory or timeout limits and still does not provide a Spark runtime.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the role of AWS Glue in data processing?
Open an interactive chat with Bash
Why is PySpark preferred over Apache Flink in this scenario?
Open an interactive chat with Bash
What are the limitations of using AWS Lambda for this use case?
Open an interactive chat with Bash
What is AWS Glue Streaming ETL?
Open an interactive chat with Bash
Why is Apache Parquet beneficial for data storage in this solution?
Open an interactive chat with Bash
How does AWS Glue compare to Amazon EMR for ETL jobs?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .