AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineering team must design an AWS data lake that stores three datasets: relational product catalog tables, clickstream events arriving as semi-structured JSON, and high-resolution product images. The solution must enable ad-hoc ANSI SQL analytics across the catalog and clickstream data, catalog object metadata for the images, require minimal ongoing administration, and keep storage costs as low as possible. Which approach best meets these requirements?
Load the catalog tables into an Amazon RDS PostgreSQL instance, write the JSON events to Amazon DynamoDB, keep images in Amazon S3, and use Amazon Redshift federated queries for analytics.
Stream all data through Amazon MSK and index it in Amazon OpenSearch Service, including the images through an attachments plug-in, then run reports with OpenSearch SQL queries.
Store Parquet files for the catalog, raw JSON files for clickstream events, and the image objects in Amazon S3; register all locations and image metadata in the AWS Glue Data Catalog and query them with Amazon Athena or Amazon Redshift Spectrum.
Ingest every dataset into a single Amazon Redshift cluster, storing the product images in a BYTEA column and using standard Redshift tables for the catalog and clickstream data.
Storing all file-based datasets in Amazon S3 provides the lowest-cost, virtually unlimited storage for structured, semi-structured, and unstructured objects. Converting the relational catalog tables to columnar Parquet files and keeping the raw JSON clickstream data in S3 allows efficient schema-on-read queries. Registering the Parquet and JSON locations, along with image object metadata, in the AWS Glue Data Catalog makes the data immediately queryable with Amazon Athena or Redshift Spectrum without provisioning any database infrastructure. The alternative options either place data in managed databases that are more expensive and capacity-limited, lack unified SQL access, or add unnecessary operational overhead.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Amazon Athena and how is it used in this solution?
Open an interactive chat with Bash
What benefits do Parquet files provide for storing catalog data?
Open an interactive chat with Bash
Why is Amazon S3 a good choice for storing images in this solution?
Open an interactive chat with Bash
What is Amazon S3 and why is it suitable for a data lake?
Open an interactive chat with Bash
What is the AWS Glue Data Catalog and how does it integrate with Amazon Athena?
Open an interactive chat with Bash
What are Parquet files and why are they used for relational datasets?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .