AWS Certified Data Engineer Associate DEA-C01 Practice Question

An organization stores Parquet data in Amazon S3 and exposes it through external tables tracked in an on-premises Apache Hive metastore. During a phased migration, new Amazon EMR clusters and Amazon Athena must read and update the same catalog while the existing Hadoop cluster keeps running. The team wants a fully managed solution and to avoid manual schema synchronization. Which approach meets these goals with minimal operations?

  • Export the current metastore database, import it into an Amazon RDS MySQL instance, and point all future EMR clusters to that RDS endpoint while the on-premises cluster retains its original metastore.

  • Schedule an AWS Glue crawler to scan the S3 prefixes hourly to recreate the tables, and direct EMR and Athena to the resulting Glue databases while the on-premises cluster continues to use its local metastore.

  • Create Lake Formation resource links for each table and grant cross-account permissions, then query the data from EMR through Redshift Spectrum and from the on-premises cluster through its existing metastore.

  • Use the AWS Glue metastore-import utility to migrate the existing Hive schema into the AWS Glue Data Catalog, then configure both new EMR clusters and the on-premises Hadoop cluster to use the Glue Data Catalog as their Hive metastore.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot