AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineering team needs to build a near-real-time pipeline that reads JSON events from Amazon Kinesis Data Streams, transforms the data with an AWS Glue 4.0 streaming job, and stores the output as partitioned Apache Parquet files in Amazon S3. The solution must minimize duplicate record processing and allow the job to restart automatically from the exact position where it stopped after any failure or worker re-deployment. Which approach will satisfy these requirements?

Set the Spark streaming windowSize parameter to 100 seconds so the job can reprocess only the last window when it restarts.
Turn on AWS Glue Auto Scaling to keep at least one idle DPU worker, ensuring the stream position is maintained during failures.
Specify an Amazon S3 checkpoint location for the Glue streaming job so Spark Structured Streaming can persist stream offsets.
Enable continuous logging and set the job's Spark log level to INFO so Glue can replay uncommitted batches after a restart.

AWS Certified Data Engineer Associate DEA-C01

Data Ingestion and Transformation

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

What is Amazon Kinesis Data Streams?

What is the role of Amazon S3 checkpointing in streaming jobs?

Why does Spark Structured Streaming require --checkpoint-location?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

What is Amazon Kinesis Data Streams?

What is the role of Amazon S3 checkpointing in streaming jobs?

Why does Spark Structured Streaming require --checkpoint-location?

Report Issue