AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineering team must design an AWS data lake that stores three datasets: relational product catalog tables, clickstream events arriving as semi-structured JSON, and high-resolution product images. The solution must enable ad-hoc ANSI SQL analytics across the catalog and clickstream data, catalog object metadata for the images, require minimal ongoing administration, and keep storage costs as low as possible. Which approach best meets these requirements?

  • Load the catalog tables into an Amazon RDS PostgreSQL instance, write the JSON events to Amazon DynamoDB, keep images in Amazon S3, and use Amazon Redshift federated queries for analytics.

  • Stream all data through Amazon MSK and index it in Amazon OpenSearch Service, including the images through an attachments plug-in, then run reports with OpenSearch SQL queries.

  • Store Parquet files for the catalog, raw JSON files for clickstream events, and the image objects in Amazon S3; register all locations and image metadata in the AWS Glue Data Catalog and query them with Amazon Athena or Amazon Redshift Spectrum.

  • Ingest every dataset into a single Amazon Redshift cluster, storing the product images in a BYTEA column and using standard Redshift tables for the catalog and clickstream data.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot