AWS Certified Data Engineer Associate DEA-C01 Practice Question

A company runs an Amazon MSK cluster that receives thousands of sales events per second. The data engineering team must aggregate the events and write the results to Amazon S3 as compressed Apache Parquet files every 5 minutes. They want to use PySpark for the transformations, pay only for the compute they consume, and avoid managing any servers or long-running clusters. Which solution meets these requirements?

  • Launch an auto-terminating Amazon EMR cluster running Spark Streaming that polls the MSK topic and writes Parquet output to S3.

  • Configure AWS Lambda functions to be triggered by the MSK topic, aggregate records over 5-minute intervals, and write Parquet files to S3.

  • Build an Amazon Kinesis Data Analytics for Apache Flink application that consumes the MSK topic, performs the aggregations, and delivers the results to S3.

  • Create an AWS Glue streaming ETL job that reads from the MSK topic, sets a 5-minute micro-batch window, transforms the data with PySpark, and writes Parquet files to S3.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot