AWS Certified Data Engineer Associate DEA-C01 Practice Question
Your company stores JSON transaction logs in Amazon S3 using the prefix s3://company-logs/year=/month=/day=
/. Analysts query the data with Amazon Athena. You must configure an AWS Glue crawler that automatically adds each new day folder as a Data Catalog partition, deletes the partition when the folder is removed, and finishes quickly by scanning only changed objects. Which Glue crawler settings meet these requirements?
Set RecrawlPolicy RecrawlBehavior = CRAWL_EVERYTHING and SchemaChangePolicy DeleteBehavior = DELETE_FROM_DATABASE.
Set RecrawlPolicy RecrawlBehavior = CRAWL_EVENT_MODE and SchemaChangePolicy DeleteBehavior = DELETE_FROM_DATABASE (UpdateBehavior = LOG).
Set RecrawlPolicy RecrawlBehavior = CRAWL_NEW_FOLDERS_ONLY and SchemaChangePolicy DeleteBehavior = LOG.
Schedule a nightly full crawl with SchemaChangePolicy UpdateBehavior = UPDATE_IN_DATABASE and DeleteBehavior = LOG.
RecrawlPolicy with RecrawlBehavior = CRAWL_EVENT_MODE enables the crawler to use Amazon S3 event notifications so each run lists only the folders mentioned in new PUT or DELETE events, giving fast incremental crawls. Setting SchemaChangePolicy DeleteBehavior = DELETE_FROM_DATABASE tells the crawler to drop the partition from the Glue Data Catalog when an object-removal event indicates that the underlying S3 folder no longer exists. This combination therefore (1) registers new YYYY/MM/DD folders as partitions, (2) removes partitions whose folders are deleted, and (3) avoids rereading data that hasn't changed.
CRAWL_NEW_FOLDERS_ONLY cannot delete partitions because the service forces DeleteBehavior to LOG. CRAWL_EVERYTHING with DELETE_FROM_DATABASE does remove partitions, but it must list the entire dataset on every run, increasing runtime and cost. Running a full crawl nightly or recreating the table each day also rescans all data and is unnecessary.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does RecrawlBehavior = CRAWL_EVENT_MODE do?
Open an interactive chat with Bash
What is the role of SchemaChangePolicy DeleteBehavior = DELETE_FROM_DATABASE?
Open an interactive chat with Bash
Why is CRAWL_NEW_FOLDERS_ONLY not suitable for this use case?
Open an interactive chat with Bash
What is the purpose of the "RecrawlPolicy" in AWS Glue?
Open an interactive chat with Bash
How does SchemaChangePolicy impact partition management in AWS Glue?
Open an interactive chat with Bash
Why is CRAWL_EVENT_MODE better than CRAWL_NEW_FOLDERS_ONLY for this use case?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .