AWS Certified Data Engineer Associate DEA-C01 Practice Question

Your company stores web server logs as hourly CSV objects in a landing Amazon S3 bucket. Data engineers must convert each file to snappy-compressed Parquet partitioned by date in another S3 bucket, update the AWS Glue Data Catalog table, and keep operational overhead as low as possible. Which solution will satisfy these business requirements in the MOST cost-effective, maintenance-efficient way?

  • Use AWS Data Pipeline to run a daily EC2 task that executes the open-source parquet-mr tool to convert incoming CSV files and copy them to the analytics bucket.

  • Launch a transient Amazon EMR cluster on a schedule; run a Spark script that converts and partitions the file, then terminates the cluster after completion.

  • Create an AWS Glue Spark ETL job triggered by an S3 event to convert the CSV file to snappy-compressed Parquet, write it to a date-partitioned path in the analytics bucket, and update the Glue Data Catalog.

  • Configure Amazon Kinesis Data Firehose with S3 as the destination and record-format conversion enabled, and invoke the Firehose PutRecord API from a Lambda function when each object is created.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot