AWS Certified Data Engineer Associate DEA-C01 Practice Question
Your company launches short-lived Amazon EMR clusters to transform Parquet files stored in Amazon S3. Hive table definitions are currently kept in a MySQL-backed Apache Hive metastore running on Amazon RDS. The data engineering team wants every new EMR cluster and Amazon Athena to reference the same metadata while minimizing administration and removing single-point failures. Which solution best meets these requirements?
Enable multi-AZ read replicas for the current Amazon RDS MySQL metastore and have each EMR cluster connect to the RDS writer endpoint.
Create a multi-node, long-running EMR cluster dedicated to hosting the hive-metastore service and point transient clusters and Athena to its Thrift endpoint.
Store the hive-site.xml file that contains table metadata in Amazon S3 and load it into each EMR cluster during bootstrap actions.
Configure every EMR cluster to use AWS Glue Data Catalog as its Hive metastore and migrate the existing metadata into the catalog.
AWS Glue Data Catalog is a fully managed, highly available, serverless metadata store. Athena already uses the Data Catalog by default, and EMR clusters can be configured to treat the catalog as their Hive metastore by setting the appropriate Hive and Spark configurations at cluster launch. Migrating the existing RDS-based Hive metastore into the Data Catalog gives all transient EMR clusters and Athena a shared, resilient catalog with no infrastructure to manage.
Running a dedicated EMR cluster merely to host the hive-metastore service adds cost and operational burden and still represents a single point of failure. RDS read replicas improve availability for the database but do not let Athena query the metadata and continue to require database administration. Copying hive-site.xml to S3 does not store the table metadata itself and provides no shared, durable metastore. Therefore, using AWS Glue Data Catalog as the metastore is the most operationally efficient and highly available choice.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Glue Data Catalog?
Open an interactive chat with Bash
How do you migrate an RDS-based Hive metastore to AWS Glue Data Catalog?
Open an interactive chat with Bash
Why is AWS Glue Data Catalog preferred over other methods for shared metadata in this scenario?
Open an interactive chat with Bash
What is the AWS Glue Data Catalog?
Open an interactive chat with Bash
How do I migrate metadata from an RDS-based Hive metastore to AWS Glue Data Catalog?
Open an interactive chat with Bash
Why is Athena compatible with AWS Glue Data Catalog by default?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .