AWS Certified Data Engineer Associate DEA-C01 Practice Question

Your company receives hourly comma-separated value (CSV) log files in an Amazon S3 prefix. Data analysts use Amazon Athena for ad-hoc queries, but scan costs and runtimes are increasing as the dataset grows. As a data engineer, you must convert both existing and future files to an optimized columnar format, partition the data by event_date, and avoid managing any servers or long-running clusters.

Which solution MOST cost-effectively meets these requirements?

  • Modify the source application to write Parquet files directly to the target S3 prefix and drop the existing CSV files once verified.

  • Enable S3 Storage Lens and apply Lifecycle rules to transition the CSV objects to the S3 Glacier Flexible Retrieval storage class after 30 days to reduce storage and Athena scan costs.

  • Provision an Amazon EMR cluster with Apache Hive, run a CREATE EXTERNAL TABLE … STORED AS ORC statement to convert the CSV data to ORC, and keep the cluster running to process new hourly files.

  • Create an AWS Glue crawler to catalog the CSV files, then schedule an AWS Glue Spark job that reads the crawler's table, writes Snappy-compressed Parquet files partitioned by event_date to a new S3 prefix, and updates the Data Catalog.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot