AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company ingests 50 000 JSON events per second from IoT sensors into an Amazon Kinesis Data Stream. The analytics team needs each record converted to Apache Parquet with sub-second latency and written to Amazon S3. The solution must scale automatically with the unpredictable event rate and require minimal infrastructure management. Which approach meets these requirements most effectively?
Configure an Amazon EMR cluster with Spark Structured Streaming to poll the stream and convert data to Parquet in Amazon S3.
Use AWS Lambda with Kinesis Data Streams as the event source; each invocation converts the JSON record to Parquet and writes it to Amazon S3.
Deliver the stream to Amazon S3 through Kinesis Data Firehose with a Lambda transformation that converts incoming records to Parquet format.
Create an AWS Glue streaming ETL job that reads from the Kinesis Data Stream and writes Parquet files to Amazon S3.
An AWS Glue streaming ETL job is serverless, so there are no clusters to provision or manage. It can read directly from Kinesis Data Streams, perform Spark-based transformations, and write Parquet files to Amazon S3 with micro-batch windows as small as 1 s. Glue streaming jobs support Auto Scaling, so they handle spikes in event volume without manual intervention.
An Amazon EMR cluster running Spark Structured Streaming could work, but the company would still have to size, monitor, and scale the cluster, increasing operational overhead.
An AWS Lambda consumer risks hitting concurrency limits and would struggle to serialize each record to Parquet efficiently at 50 000 TPS.
Kinesis Data Firehose can deliver to S3 and convert formats, but its buffering interval (minimum 60 s) prevents sub-second latency. Therefore, the Glue streaming ETL job best satisfies latency, scalability, and management requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Glue streaming ETL?
Open an interactive chat with Bash
Why is Apache Parquet preferred for storage in this solution?
Open an interactive chat with Bash
How does Auto Scaling in AWS Glue Streaming ETL work?
Open an interactive chat with Bash
What is AWS Glue and how does it compare to EMR?
Open an interactive chat with Bash
Why is Apache Parquet preferred for storing data in Amazon S3?
Open an interactive chat with Bash
What are the benefits of Auto Scaling in AWS Glue streaming jobs?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .