A mobile-gaming company currently pushes every JSON click-event (≈150 000 events per second) to a cloud object store as an individual file via a REST API call. The finance team reports that API request charges and data-transfer fees have become a major portion of the monthly bill. Analysts only need aggregated dashboards refreshed every 10 minutes. To cut costs while meeting the 10-minute service-level agreement, which change best applies the batching ingestion pattern?
Keep per-event uploads but convert the payload format from JSON to Avro for smaller messages.
Implement a change-data-capture tool that streams every new record to cloud storage in real time.
Publish each event to a Kafka topic and process it with Spark Structured Streaming in continuous mode.
Aggregate events into compressed Parquet files (about 128 MB each) and upload them every five minutes.
Grouping many events into a single, compressed object and uploading it on a fixed schedule is the essence of batching. Doing so dramatically reduces the number of API calls (each PUT uploads thousands of records instead of one) and allows aggressive compression and columnar formats, lowering both request and transfer costs while still providing data within the 10-minute latency budget. Switching to real-time streaming or CDC would increase, not decrease, request volume and infrastructure complexity. Merely switching from JSON to Avro without bundling events continues the one-event-per-call pattern; it can shrink payload size slightly but does not address the dominant cost driver-the sheer number of requests.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Parquet, and why is it preferred for batching?
Open an interactive chat with Bash
Why is batching more cost-effective than real-time streaming for this scenario?
Open an interactive chat with Bash
Why is switching from JSON to Avro insufficient in addressing the cost issue?