AWS Certified Data Engineer Associate DEA-C01 Practice Question

An ecommerce company stores 5 GB of JSON click-stream records in an Amazon S3 prefix each day. The analytics team must convert the data to compressed Parquet, partition the output by event_date, and automatically adapt when new fields appear. The transformation must finish within one hour and require the least ongoing operational effort. Which solution meets these requirements?

  • Spin up an on-demand Amazon EMR cluster daily, run a PySpark script to convert the JSON to Parquet partitions, and terminate the cluster when the job completes.

  • Configure an Amazon S3 event to invoke an AWS Lambda function that processes each object and writes the transformed data to Parquet partitions in another bucket.

  • Create an AWS Glue Spark ETL job that reads the JSON data into a DynamicFrame and writes compressed, event_date-partitioned Parquet files back to Amazon S3.

  • Define an external table on the JSON files in Amazon Redshift Spectrum and schedule an hourly CTAS query that writes Parquet partitions to a different S3 prefix.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot