AWS Certified Data Engineer Associate DEA-C01 Practice Question
An analytics team has cataloged several Parquet tables in AWS Glue Data Catalog. You are launching an Amazon EMR 6.6 cluster running Spark SQL to process the data in Amazon S3. The cluster should query the tables without copying metadata locally, and any changes to table definitions made in the Data Catalog must become immediately available to the jobs. Which solution meets these requirements with the least operational effort?
Configure the EMR cluster to use AWS Glue Data Catalog as its Hive metastore by enabling the glue-datacatalog integration and granting the cluster role Glue permissions.
Create an Amazon Athena workgroup and connect the cluster's Spark engine to it through the JDBC driver so Spark queries can read the tables.
Add a bootstrap action that exports the Data Catalog tables as Hive DDL statements and executes them with beeline to populate the cluster's local metastore at startup.
Run an hourly AWS Glue crawler that writes updated schemas into the cluster's default Hive metastore hosted on Amazon RDS.
Configuring the EMR cluster to use AWS Glue Data Catalog as its Hive metastore lets Spark and Hive clients reference the existing table definitions directly. The cluster is pointed to the Data Catalog by setting the glue-specific Hive metastore client factory class (or by adding the enable-glue-datacatalog classification). Because the catalog remains external to the cluster, any future schema updates or new partitions registered in AWS Glue are automatically visible to running jobs without additional scripts or crawlers. Exporting metadata to a local Hive metastore or executing DDL during bootstrap creates a separate copy that must be maintained, increasing operational overhead. Connecting Spark through an Athena JDBC driver does not provide direct access to Glue metadata for Spark SQL processing on the cluster and adds unnecessary complexity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the AWS Glue Data Catalog and how does it work?
Open an interactive chat with Bash
How do you enable Glue Data Catalog integration in Amazon EMR?
Open an interactive chat with Bash
Why is using AWS Glue Data Catalog more efficient than a local Hadoop Hive metastore?
Open an interactive chat with Bash
What is the AWS Glue Data Catalog and its purpose?
Open an interactive chat with Bash
How does enabling the glue-datacatalog integration work on an EMR cluster?
Open an interactive chat with Bash
Why is using AWS Glue Data Catalog better than local Hive metastores for EMR jobs?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .