AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineering team receives a 5-TB JSON file in an S3 bucket each day. They must flatten nested objects, convert the data to partitioned Parquet, and make it queryable in Athena within two hours. The team wants a fully managed, serverless solution and prefers to avoid provisioning persistent clusters. Which approach meets these requirements most cost-effectively?

  • Build an Amazon Kinesis Data Analytics for Apache Flink application that uses the Amazon S3 connector to process the file and output Parquet data to S3.

  • Run an Amazon Athena CTAS statement that reads the JSON file and writes the result as partitioned Parquet objects to a separate S3 location.

  • Spin up an on-demand Amazon EMR cluster with Apache Spark each day, run a Spark transformation job, and terminate the cluster after the job finishes.

  • Create an AWS Glue Spark ETL job with job bookmarks enabled that reads the JSON file, flattens the data, writes partitioned Parquet back to S3, and updates the Glue Data Catalog.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot