AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineer is building an AWS Glue PySpark job that runs hourly data-quality checks on a 10 TB orders dataset stored in Amazon S3. The data is heavily skewed across 12 distinct values in the order_status column; several rare statuses represent business-critical exceptions. The team must minimize cost by reading only a small fraction of the dataset while guaranteeing that every status is examined during each run. Which sampling technique BEST satisfies these requirements?
Implement stratified sampling on the order_status column so each status contributes a proportionate subset of records to every hourly sample.
Apply reservoir sampling in a single pass to collect a fixed-size subset of records.
Perform simple random sampling without replacement on the entire dataset at a fixed 1 percent rate.
Use systematic sampling by sorting the data and selecting every Nth record.
Stratified sampling intentionally divides a population into mutually exclusive strata-here, each order_status value-and then draws a sample from every stratum. This guarantees that even infrequent statuses appear in every hourly validation, reducing the risk that critical exceptions are missed. Simple random, systematic, and reservoir sampling all select records without regard to categorical balance, so rare statuses could be absent from a given sample, violating the requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is stratified sampling, and how does it work?
Open an interactive chat with Bash
How does stratified sampling differ from simple random sampling?
Open an interactive chat with Bash
Why is stratified sampling important in skewed datasets?
Open an interactive chat with Bash
Why is stratified sampling the best choice for this scenario?
Open an interactive chat with Bash
How does stratified sampling reduce the risk of missing important data?
Open an interactive chat with Bash
What is the difference between stratified sampling and simple random sampling?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .