AWS Certified Data Engineer Associate DEA-C01 Practice Question
Your company ingests website click-stream events that are serialized as JSON. The structure of the events will evolve as new product features are released, and the data engineering team wants analysts to run ad-hoc SQL queries in Amazon Redshift without performing manual DDL each time a new attribute appears. The solution must keep storage costs low and avoid interrupting existing queries. Which design meets these requirements?
Use AWS Database Migration Service (AWS DMS) to load the events from S3 into a Redshift columnar table and run a nightly job that issues ALTER TABLE ADD COLUMN statements for any new attributes.
Write the JSON events to Amazon S3, use an AWS Glue crawler to catalog the files, and create an Amazon Redshift Spectrum external table that references the Glue Data Catalog.
Stream the JSON events directly into an Amazon Redshift table that uses the SUPER data type and rely on Redshift to surface new keys automatically.
Persist the events in an Amazon RDS PostgreSQL database and query the table from Redshift by using federated queries.
Landing the raw JSON objects in Amazon S3 keeps storage costs lower than storing them inside the data warehouse. When an AWS Glue crawler catalogs the objects, it applies schema-on-read semantics and automatically adds any new or missing attributes it discovers. Creating an external schema in Amazon Redshift that points to the Glue Data Catalog lets analysts query the data through Redshift Spectrum. Because Spectrum consults the Data Catalog at query time, new attributes become available to analysts immediately-no ALTER TABLE commands and no downtime.
Why the other designs fall short:
Streaming the JSON directly into a Redshift table that uses the SUPER data type does allow schemaless ingestion without DDL, but all raw data is stored in Redshift Managed Storage, which is more expensive than S3, so it does not satisfy the cost constraint.
Using AWS DMS to load the events and running nightly ALTER TABLE commands adds operational overhead and locks the table during DDL, interrupting queries.
Persisting the events in an Amazon RDS PostgreSQL database and querying through Redshift federated queries duplicates data in a full relational database, incurs higher storage and compute costs, and still doesn't provide automatic schema evolution for semi-structured JSON columns.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Amazon Redshift Spectrum and how does it enable querying data in S3?
Open an interactive chat with Bash
How does an AWS Glue crawler work, and why is it useful here?
Open an interactive chat with Bash
Why is storing JSON in Amazon S3 more cost-effective than in Redshift Managed Storage?
Open an interactive chat with Bash
What is the purpose of Redshift Spectrum and how does it work?
Open an interactive chat with Bash
How does an AWS Glue crawler automatically handle schema evolution?
Open an interactive chat with Bash
Why is S3 a cost-effective storage choice compared to Redshift Managed Storage?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .