AWS Certified Data Engineer Associate DEA-C01 Practice Question
An ecommerce company stores hourly JSON click-stream files in the s3://raw bucket and needs them filtered for null sessionId values, flattened, and written to s3://curated partitioned by date and hour. The solution must automatically adapt when new fields appear, require little ongoing infrastructure management, and scale cost-effectively with fluctuating traffic. Which approach best meets these requirements?
Launch an Amazon EMR cluster with auto-scaling, trigger a Spark application through AWS Step Functions every hour, and use Hive commands to add partitions to the curated bucket.
Create an AWS Glue Spark ETL job that reads the raw files as DynamicFrames from a Data Catalog table, enables job bookmarking and dynamic partitioning, and schedule the job hourly with Amazon EventBridge.
Configure an S3 ObjectCreated event to invoke an AWS Lambda function that parses each file and writes the filtered results to the curated bucket.
Run an hourly Redshift COPY from the raw bucket into a staging table, transform with SQL, then UNLOAD the curated data back to S3 partitioned by date and hour.
An AWS Glue Spark job uses DynamicFrames, which automatically adjust to evolving JSON schemas and eliminate manual table alterations. By enabling job bookmarking the job skips files already processed, and the ApplyMapping/PartitionKeys transforms can create date- and hour-based partitions in s3://curated. Glue is serverless, so there is no cluster to manage and costs accrue only while the job runs. An EventBridge rule can invoke the job on an hourly schedule with no additional orchestration.
An EMR cluster would satisfy the transformation but introduces cluster provisioning and scaling overhead, conflicting with the low-maintenance requirement. Invoking a Lambda function for each object makes handling large hourly batches difficult and is constrained by 15-minute execution and memory limits. Loading into Redshift and UNLOADing back to S3 is indirect, incurs additional storage and compute charges, and does not natively keep the output partitioned by hour in S3.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are AWS Glue DynamicFrames?
Open an interactive chat with Bash
How does AWS Glue job bookmarking work?
Open an interactive chat with Bash
Why is EventBridge used for scheduling the Glue job?
Open an interactive chat with Bash
What are AWS Glue DynamicFrames?
Open an interactive chat with Bash
How does job bookmarking work in AWS Glue?
Open an interactive chat with Bash
Why is EventBridge useful for scheduling AWS Glue jobs?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .