AWS Certified Data Engineer Associate DEA-C01 Practice Question

A media company uses an S3 data lake. CSV files are delivered every hour to the prefix s3://company-raw/year=/month=/day=

/. A data engineer must convert each new batch to Apache Parquet, partitioned by the same date keys, and catalog the resulting tables so they are queryable in Amazon Athena. The solution must:
  • avoid re-processing files that were already converted
  • scale without provisioning or managing servers
  • require the least custom code

Which approach meets these requirements MOST cost-effectively?

  • Launch an AWS Glue Python shell job on an hourly schedule that reads the CSV files with pandas, converts them to Parquet, and writes the results to the curated prefix.

  • Set up an Amazon Kinesis Data Firehose delivery stream with an S3 source and Parquet output conversion enabled, then point it at the raw bucket prefix.

  • Create an AWS Glue Spark ETL job that reads from the raw S3 prefix, enables job bookmarks, writes the output in Parquet to an s3://company-curated/ prefix partitioned by year, month, and day, and updates the AWS Glue Data Catalog on each run.

  • Configure an AWS Lambda function triggered by S3 ObjectCreated events that converts each CSV file to Parquet, writes it to the curated bucket, and uses the Athena API to add partitions.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot