AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company stores application logs as compressed JSON files in an Amazon S3 location that is partitioned by the prefix logs/region/date=YYYY-MM-DD. A data engineer created an AWS Glue crawler that builds an Athena table so analysts can run ad-hoc queries. The crawler runs on a daily schedule, but after several months it spends most of its run time re-processing unchanged folders, delaying data availability for the most recent partition.
Which crawler configuration change will minimize the crawl time without requiring code changes to the ingest process?
Switch the crawler trigger to Amazon S3 event notifications so it runs once for every new object.
Configure the crawler to create a separate table for each region/date folder.
Enable partition projection in the Athena table and delete the crawler.
Change the crawler's recrawl behavior to CRAWL_NEW_FOLDERS_ONLY so it processes only folders that were added since the last run.
AWS Glue crawlers keep track of the folders they have already processed. Setting the crawler's recrawl policy to CRAWL_NEW_FOLDERS_ONLY turns the crawler into an incremental crawler: on each run it compares the current S3 prefix to its internal state and inspects only folders that appeared since the previous crawl. Existing partitions and their schemas are left untouched, so the crawler finishes quickly while still creating or updating the catalog entry for the newest date partition. The other options either continue to scan all folders, rely on S3 event notifications that are not configured, or require changing the folder naming convention used by the ingestion jobs.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a recrawl policy in AWS Glue and how does it affect crawler efficiency?
Open an interactive chat with Bash
How does partitioning in Amazon S3 improve query performance in Athena?
Open an interactive chat with Bash
What is the difference between AWS Glue and Athena in terms of functionality?
Open an interactive chat with Bash
What is AWS Glue and its main purpose?
Open an interactive chat with Bash
What does the AWS Glue recrawl behavior option 'CRAWL_NEW_FOLDERS_ONLY' do?
Open an interactive chat with Bash
What is partition projection in Amazon Athena, and why wasn't it correct in the provided solution?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .