AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineer is building an AWS Glue PySpark job that runs hourly data-quality checks on a 10 TB orders dataset stored in Amazon S3. The data is heavily skewed across 12 distinct values in the order_status column; several rare statuses represent business-critical exceptions. The team must minimize cost by reading only a small fraction of the dataset while guaranteeing that every status is examined during each run. Which sampling technique BEST satisfies these requirements?

Perform simple random sampling without replacement on the entire dataset at a fixed 1 percent rate.
Implement stratified sampling on the order_status column so each status contributes a proportionate subset of records to every hourly sample.
Use systematic sampling by sorting the data and selecting every Nth record.
Apply reservoir sampling in a single pass to collect a fixed-size subset of records.

AWS Certified Data Engineer Associate DEA-C01

Data Operations and Support

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

What is stratified sampling, and how does it work?

How does stratified sampling differ from simple random sampling?

Why is stratified sampling important in skewed datasets?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

What is stratified sampling, and how does it work?

How does stratified sampling differ from simple random sampling?

Why is stratified sampling important in skewed datasets?

Report Issue