AWS Certified Data Engineer Associate DEA-C01 Practice Question
An ecommerce company stores 5 GB of JSON click-stream records in an Amazon S3 prefix each day. The analytics team must convert the data to compressed Parquet, partition the output by event_date, and automatically adapt when new fields appear. The transformation must finish within one hour and require the least ongoing operational effort. Which solution meets these requirements?
Spin up an on-demand Amazon EMR cluster daily, run a PySpark script to convert the JSON to Parquet partitions, and terminate the cluster when the job completes.
Configure an Amazon S3 event to invoke an AWS Lambda function that processes each object and writes the transformed data to Parquet partitions in another bucket.
Create an AWS Glue Spark ETL job that reads the JSON data into a DynamicFrame and writes compressed, event_date-partitioned Parquet files back to Amazon S3.
Define an external table on the JSON files in Amazon Redshift Spectrum and schedule an hourly CTAS query that writes Parquet partitions to a different S3 prefix.
AWS Glue is a fully managed, serverless Spark environment that scales automatically and eliminates cluster administration. A Glue job can read JSON into a DynamicFrame, which detects and merges schema changes at runtime, then write partitioned Parquet to S3 in a single run that comfortably handles 5 GB within one hour. Launching an EMR cluster would add provisioning and tuning overhead, Redshift Spectrum CTAS requires manual DDL updates when the schema changes, and a Lambda function is constrained by memory and run-time limits for multi-gigabyte batch transforms.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Glue, and why is it suited for this use case?
Open an interactive chat with Bash
What is a DynamicFrame and how does it handle schema changes?
Open an interactive chat with Bash
Why are Parquet files preferred over JSON in this context?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .