AWS Certified Data Engineer Associate DEA-C01 Practice Question
An application writes daily Parquet files to Amazon S3 using the folder structure s3://sensor-data/sensor_date=YYYY-MM-DD/. Next week the development team will add a new optional column named reading_quality to the payload. Several analytics teams query the data through Amazon Athena and Amazon Redshift Spectrum by referencing an AWS Glue Data Catalog table. What is the most efficient way to make the new column available while keeping historical query performance and avoiding large-scale rewrites?
Write new partitions that contain the reading_quality column and alter the existing AWS Glue table to add the column as nullable.
Rewrite all historical Parquet files to include the reading_quality column, then refresh the Glue Data Catalog.
Convert the entire dataset to compressed CSV so that missing columns are ignored during scans.
Create a second Glue table that includes the new column and have users UNION results from both tables when needed.
Parquet supports schema evolution when new fields are appended as nullable columns. By writing new partitions with the additional reading_quality field and then altering the existing AWS Glue Data Catalog table to include the column, Athena and Redshift Spectrum will automatically return NULL for older files that do not contain the field. Re-writing or duplicating historical data is unnecessary and converting to another format would increase both cost and latency.
Incorrect choices:
Rewriting all prior partitions wastes compute and I/O and provides no performance benefit.
Maintaining two separate tables and UNIONing increases complexity and doubles metadata.
Converting to CSV removes columnar compression and predicate pushdown, degrading performance.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is schema evolution in Parquet files?
Open an interactive chat with Bash
How do AWS Glue Data Catalog tables handle schema changes?
Open an interactive chat with Bash
Why is Parquet preferred over other formats like CSV for querying large datasets?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .