A data science team is deploying a real-time fraud detection model for a financial institution. The system's architecture requires that the data used for model inference be perfectly consistent with the primary transactional database at all times. Any lag in data propagation could lead to significant financial loss. Given this strict requirement for data integrity and consistency, which data replication strategy is the most appropriate for the underlying feature store?
Snapshot replication, as it creates point-in-time copies of the data, which is ideal for versioning datasets for model reproducibility and retraining.
Multi-master replication, as it allows writes to any node, maximizing availability and write performance across geographically distributed locations.
Asynchronous replication, as it provides the lowest write latency by acknowledging writes before they are committed to the replica, which is ideal for high-throughput systems.
Synchronous replication, as it guarantees a write operation is committed to both the primary and replica before returning success, ensuring zero data loss (RPO=0) at the cost of higher write latency.
The correct answer is synchronous replication. In the scenario, the most critical requirement is perfect consistency between the primary database and the replica used for inference to prevent fraud based on stale data. Synchronous replication guarantees this by writing data to both the primary and replica locations before confirming the transaction is complete. This ensures a Recovery Point Objective (RPO) of zero, meaning no data is lost upon a failure.
Asynchronous replication prioritizes performance by confirming writes on the primary before they are sent to the replica, which introduces a data lag. This lag is unacceptable in a real-time fraud detection system where perfect consistency is required.
Snapshot replication is used for creating point-in-time backups for disaster recovery or versioning datasets for model training, but it is not a real-time solution and does not provide the continuous consistency needed for live inference.
Multi-master replication allows writes on multiple nodes but typically provides eventual consistency and adds complexity around conflict resolution, making it less suitable than synchronous replication when absolute, immediate consistency is the primary goal.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is synchronous replication?
Open an interactive chat with Bash
Why is asynchronous replication not suitable for real-time systems?
Open an interactive chat with Bash
What is the difference between snapshot replication and synchronous replication?