AWS Certified Solutions Architect Associate SAA-C03 Practice Question
A data engineering team stores hundreds of gigabytes of raw CSV files in an Amazon S3 data lake. They need to convert this data to Apache Parquet on a daily schedule as part of an ETL pipeline. The team wants a fully managed, serverless solution that lets them define the pipeline visually and perform the conversion without writing any code. Which AWS service or feature best meets these requirements?
Launch an Amazon EMR cluster running a custom Spark script that converts the files.
Configure Amazon S3 event notifications to trigger an AWS Lambda function that runs a Python conversion script.
Create an AWS Glue Studio visual ETL job that reads the CSV files and writes the output in Parquet format.
Set up AWS Data Pipeline with a ShellCommandActivity that uses the parquet-mr tool to rewrite the files.
AWS Glue Studio lets you build ETL pipelines through a drag-and-drop interface, generates the underlying Apache Spark code for you, and runs the job on AWS Glue's serverless Spark engine. You can configure a visual job that reads the CSV files in Amazon S3 and writes the output as Parquet without authoring any conversion code. The other options either require you to provision infrastructure (Amazon EMR), write and maintain custom scripts (Lambda or Data Pipeline), or are not code-free.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are the advantages of using Parquet format over CSV?
Open an interactive chat with Bash
How does AWS Glue handle the ETL process without writing code?
Open an interactive chat with Bash
What is the role of Amazon S3 in the AWS Glue ETL process?