AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineering team receives a 5-TB JSON file in an S3 bucket each day. They must flatten nested objects, convert the data to partitioned Parquet, and make it queryable in Athena within two hours. The team wants a fully managed, serverless solution and prefers to avoid provisioning persistent clusters. Which approach meets these requirements most cost-effectively?
Spin up an on-demand Amazon EMR cluster with Apache Spark each day, run a Spark transformation job, and terminate the cluster after the job finishes.
Run an Amazon Athena CTAS statement that reads the JSON file and writes the result as partitioned Parquet objects to a separate S3 location.
Build an Amazon Kinesis Data Analytics for Apache Flink application that uses the Amazon S3 connector to process the file and output Parquet data to S3.
Create an AWS Glue Spark ETL job with job bookmarks enabled that reads the JSON file, flattens the data, writes partitioned Parquet back to S3, and updates the Glue Data Catalog.
AWS Glue provides a fully managed, serverless Spark environment billed per DPU-second, so the team can run a batch ETL job only when the file arrives and pay for resources while the job is active. Spark transforms can easily denormalize nested JSON and write partitioned Parquet back to Amazon S3, and the job can update the AWS Glue Data Catalog so Athena can query the results immediately. Job bookmarks can track processed objects across runs without maintaining state.
Launching an Amazon EMR cluster each day adds cluster start-up time, requires operational management, and incurs cost even during the cluster bootstrap process. Athena CTAS can convert JSON to Parquet but offers limited support for complex nested transformations and may exceed the two-hour window for 5 TB. Kinesis Data Analytics for Apache Flink is optimized for streaming rather than large batch files and would be unnecessarily complex and expensive for this workload.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Glue and why is it suitable for this use case?
Open an interactive chat with Bash
What are job bookmarks in AWS Glue and how do they help?
Open an interactive chat with Bash
Why use partitioned Parquet and how does it benefit Athena queries?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .