AWS Certified Data Engineer Associate DEA-C01 Practice Question
An organization stores Parquet data in Amazon S3 and exposes it through external tables tracked in an on-premises Apache Hive metastore. During a phased migration, new Amazon EMR clusters and Amazon Athena must read and update the same catalog while the existing Hadoop cluster keeps running. The team wants a fully managed solution and to avoid manual schema synchronization. Which approach meets these goals with minimal operations?
Export the current metastore database, import it into an Amazon RDS MySQL instance, and point all future EMR clusters to that RDS endpoint while the on-premises cluster retains its original metastore.
Schedule an AWS Glue crawler to scan the S3 prefixes hourly to recreate the tables, and direct EMR and Athena to the resulting Glue databases while the on-premises cluster continues to use its local metastore.
Create Lake Formation resource links for each table and grant cross-account permissions, then query the data from EMR through Redshift Spectrum and from the on-premises cluster through its existing metastore.
Use the AWS Glue metastore-import utility to migrate the existing Hive schema into the AWS Glue Data Catalog, then configure both new EMR clusters and the on-premises Hadoop cluster to use the Glue Data Catalog as their Hive metastore.
Migrating the existing Hive metastore into the AWS Glue Data Catalog provides a fully managed, highly available catalog that is natively used by Amazon Athena. After running the Glue metastore-import utility once, all existing databases and tables are stored in Glue. EMR clusters can be launched with the AWS Glue Data Catalog enabled, so Spark, Hive, and Presto jobs automatically read and update the same metadata store. The on-premises cluster can continue to query and update the catalog by replacing its local metastore client with the Glue Data Catalog Hive Metastore client, so any schema changes are immediately visible to all environments without additional replication or scheduled exports. The other options either keep two independent metastores that must be synchronized, rely on manual exports, or use services (Lake Formation resource links, nightly crawlers) that do not address the need for a single authoritative Hive-compatible catalog.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the AWS Glue metastore-import utility?
Open an interactive chat with Bash
How do EMR clusters use the AWS Glue Data Catalog as a Hive metastore?
Open an interactive chat with Bash
How does the Glue Data Catalog integrate with on-premises Hadoop clusters?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .