AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineer must enable analysts to run ad hoc SQL queries from Amazon Athena, Amazon Redshift Spectrum, and Amazon EMR Presto against semi-structured JSON files stored in an S3 data lake. The solution must avoid duplicating table definitions and should automatically detect new daily partitions that land in the same S3 prefix. Which approach meets these requirements with minimal operational overhead?

  • Embed the JSON schema in every Spark job and instruct analysts to load the data into temporary views before running SQL queries.

  • Configure an AWS Glue crawler on the S3 prefix to populate an AWS Glue Data Catalog table and have all query engines reference that catalog.

  • Store Avro schema definition files alongside the data in S3 and rely on each engine's SerDe to discover new partitions at query time.

  • Create separate external tables with identical names in Athena, Redshift Spectrum, and the EMR Hive metastore, updating each table manually when partitions arrive.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot