AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineering team needs to build a near-real-time pipeline that reads JSON events from Amazon Kinesis Data Streams, transforms the data with an AWS Glue 4.0 streaming job, and stores the output as partitioned Apache Parquet files in Amazon S3. The solution must minimize duplicate record processing and allow the job to restart automatically from the exact position where it stopped after any failure or worker re-deployment. Which approach will satisfy these requirements?

  • Set the Spark streaming windowSize parameter to 100 seconds so the job can reprocess only the last window when it restarts.

  • Turn on AWS Glue Auto Scaling to keep at least one idle DPU worker, ensuring the stream position is maintained during failures.

  • Specify an Amazon S3 checkpoint location for the Glue streaming job so Spark Structured Streaming can persist stream offsets.

  • Enable continuous logging and set the job's Spark log level to INFO so Glue can replay uncommitted batches after a restart.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot