Your company ingests hundreds of millions of payment events per day into a partitioned BigQuery table through Dataflow for near-real-time analytics. A newly built back-office application must apply multi-row corrections and read the updated balances within the same transaction. You must meet this strict ACID requirement while preserving the existing analytic workflows and avoiding manual capacity management as data volume grows. What should you do?
Ingest the events into Cloud Spanner instead, let the back-office service perform its transactional updates there, and expose the data to analysts by querying Cloud Spanner directly from BigQuery through a connection.
Periodically export the BigQuery table to Cloud Storage, run Dataflow to apply corrections, and reload the cleansed dataset into a replacement BigQuery table.
Keep all data in BigQuery and have the back-office service issue multi-statement transactions with UPDATE statements through the BigQuery API.
Migrate the dataset to Cloud SQL for PostgreSQL, scale the instance vertically as volume grows, and replicate changes to BigQuery nightly using Datastream for analytics.
BigQuery is optimized for large-scale analytical workloads and offers statement-level atomicity, but it is not designed to handle high-concurrency, row-level OLTP transactions that demand immediate read-after-write consistency across multiple rows. Cloud Spanner provides fully managed, horizontally scalable relational storage with external consistency and full ACID transactions, and it can be queried from BigQuery via a Cloud Spanner connection, allowing analytics to continue without exporting data. Using BigQuery DML or multi-statement scripts would not deliver the required OLTP guarantees or performance. Export-and-reload or nightly replication introduces unacceptable latency, and Cloud SQL would require manual capacity planning and does not scale elastically to hundreds of millions of daily events.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud Spanner, and why is it suitable for this use case?