AWS Certified Data Engineer Associate DEA-C01 Practice Question

Your analytics team queries click-stream events that are written as Parquet files to an Amazon S3 data lake. An AWS Glue Data Catalog table and an Amazon Redshift Spectrum external table reference the dataset, which is partitioned by year, month, and day. A new business requirement adds the string column user_country to every new event record. Historical Parquet files will not be backfilled. You must expose the new column to analysts without interrupting existing workloads, and older partitions should continue to return NULL for the column. Which action will meet these requirements with the least disruption?

  • Run ALTER TABLE <external_table> REPLACE COLUMNS (...) specifying the full updated column list so that the table definition is replaced.

  • Create a new Glue table and Spectrum external table that include the user_country column, and instruct analysts to switch their queries to the new tables.

  • Issue ALTER TABLE <external_table> ADD COLUMN user_country varchar(2); from Amazon Redshift, allowing Redshift Spectrum to update the Glue table and return NULL for the column in older partitions.

  • Drop and recreate the Glue and Redshift Spectrum tables with the new schema after all historical Parquet files are backfilled.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot