AWS Certified Data Engineer Associate DEA-C01 Practice Question

An analytics team has cataloged several Parquet tables in AWS Glue Data Catalog. You are launching an Amazon EMR 6.6 cluster running Spark SQL to process the data in Amazon S3. The cluster should query the tables without copying metadata locally, and any changes to table definitions made in the Data Catalog must become immediately available to the jobs. Which solution meets these requirements with the least operational effort?

  • Configure the EMR cluster to use AWS Glue Data Catalog as its Hive metastore by enabling the glue-datacatalog integration and granting the cluster role Glue permissions.

  • Create an Amazon Athena workgroup and connect the cluster's Spark engine to it through the JDBC driver so Spark queries can read the tables.

  • Add a bootstrap action that exports the Data Catalog tables as Hive DDL statements and executes them with beeline to populate the cluster's local metastore at startup.

  • Run an hourly AWS Glue crawler that writes updated schemas into the cluster's default Hive metastore hosted on Amazon RDS.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot