AWS Certified Solutions Architect Associate SAA-C03 Practice Question
Your client stores terabytes of comma-separated values (CSV) files in Amazon S3. To speed up future analytics queries and reduce storage costs, they plan to convert these files to the Apache Parquet columnar format. They want a fully managed, low-maintenance solution that can handle the extract, transform, and load (ETL) workload at scale. Which AWS service should they use?
AWS Glue is a serverless, fully managed data-integration service that can run ETL jobs to read CSV files from Amazon S3, transform the data, and write it back in columnar formats such as Parquet or ORC. Glue automatically provisions and scales the underlying resources, so there is no infrastructure to manage.
Amazon S3 Batch Operations can invoke a Lambda function for each object, but you must author and maintain that function and manage concurrency limits-more operational effort.
AWS Lambda alone would require you to orchestrate triggers, retries, and scaling logic manually.
Amazon Kinesis Data Firehose is designed for ingesting and transforming streaming data; it cannot process existing objects already stored in S3 in bulk. Therefore, AWS Glue best meets the requirements with the least maintenance overhead.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is ETL and why is it important?
Open an interactive chat with Bash
What are the benefits of using columnar storage formats?
Open an interactive chat with Bash
What differentiates a fully managed ETL service from other data processing options?