AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company ingests daily JSON files into the s3://sales/raw/ prefix. An AWS Glue Spark job converts the files to Parquet and loads the results into an Amazon Redshift table. The job now takes hours because it reprocesses two years of files every night. You must limit processing to only files added since the last run without changing code. What should you do?
Reconfigure the job as an AWS Glue streaming job that reads from a Kinesis Data Stream.
Add an S3 event notification that invokes an AWS Lambda function to call StartJobRun for each new object key.
Enable AWS Glue job bookmarks for the existing job and keep the default run schedule.
Create time-based folders in Amazon S3, define an Athena external table with partition projection, and query it from Redshift Spectrum instead of using ETL.
AWS Glue job bookmarks persist state information for each supported source so subsequent runs can skip objects or partitions that have already been processed. You can turn the feature on in the job configuration by setting Job bookmark to Enable in the console or by passing --job-bookmark-option job-bookmark-enable when starting the job. This change is configuration-only and requires no modifications to the Spark script. The other options either introduce new services, require additional coding, or still scan the entire dataset, so they do not satisfy the requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are AWS Glue job bookmarks?
Open an interactive chat with Bash
How does enabling job bookmarks improve performance in AWS Glue jobs?
Open an interactive chat with Bash
How can you enable AWS Glue job bookmarks using the AWS Console?
Open an interactive chat with Bash
What are AWS Glue job bookmarks?
Open an interactive chat with Bash
How do job bookmarks improve AWS Glue job efficiency?
Open an interactive chat with Bash
What is Parquet and why is it used in ETL workflows?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .