AWS Certified Solutions Architect Professional SAP-C02 Practice Question
A financial analytics company runs a critical overnight ETL job using a self-managed Apache Spark cluster on a fleet of r5.4xlarge EC2 instances. The job runs for approximately 4 hours each night, but the cluster remains active 24/7 to be ready for the next run, leading to high costs from idle resources. The data processing volume can fluctuate by up to 50% day-to-day. The operations team spends considerable time on cluster maintenance, security patching, and managing Spark versions. A solutions architect has been tasked with proposing a new architecture that most significantly reduces the Total Cost of Ownership (TCO) while maintaining the processing capabilities. Which AWS managed service offering should the architect recommend?
Keep the existing cluster architecture but purchase an Instance Savings Plan for the r5 instance family to cover the EC2 usage.
Containerize the Spark application and orchestrate it using AWS Batch with AWS Fargate compute environments.
Re-platform the job to run on a transient Amazon EMR cluster that uses Spot Instances for task nodes.
The correct answer is to use AWS Glue for the ETL jobs. AWS Glue is a fully managed, serverless data integration service that runs ETL jobs on a managed Apache Spark environment. This solution is the most cost-effective because it directly addresses the primary sources of high TCO in the scenario: operational overhead and idle compute time. With Glue, the company only pays for the Data Processing Units (DPUs) consumed while the ETL job is actively running, completely eliminating costs for the 20 hours of daily idle time. AWS also manages all the underlying infrastructure, including provisioning, patching, and scaling, which removes the maintenance burden from the operations team.
Amazon EMR with Spot Instances: This is a plausible but less optimal solution. While using a transient EMR cluster with Spot Instances would significantly reduce compute costs compared to the current setup, it does not fully eliminate management overhead. The team would still need to configure, launch, and manage the EMR cluster lifecycle. For a pure, scheduled ETL workload, the completely serverless nature of Glue offers a lower overall TCO.
AWS Batch with Fargate compute environments: This is incorrect because AWS Batch is a general-purpose batch processing service and is not purpose-built for running distributed Apache Spark jobs. Containerizing a Spark application and orchestrating it with AWS Batch would require significant custom engineering and maintenance, increasing complexity and TCO compared to a dedicated managed Spark service.
Amazon EMR on EC2 with an Instance Savings Plan: This is the least effective option. An Instance Savings Plan provides a discount in exchange for a commitment to a consistent level of compute usage over a 1 or 3-year term. Applying a Savings Plan to a cluster that is idle for 20 hours a day would mean committing to pay for unused resources, which locks in the inefficiency instead of eliminating it.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are AWS Glue DPUs and how do they work?
Open an interactive chat with Bash
Why is AWS Glue considered fully serverless compared to Amazon EMR?
Open an interactive chat with Bash
How does AWS Glue handle fluctuating data processing volumes?
Open an interactive chat with Bash
AWS Certified Solutions Architect Professional SAP-C02
Design for New Solutions
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
IT & Cybersecurity Package Join Premium for Full Access