You are migrating a TensorFlow 2.x computer-vision project to a workstation that contains four identical GPUs. The team wants to keep the code changes minimal-essentially just wrapping the model's construction and compile calls inside a distribution-strategy scope-and they require synchronous data-parallel training, keeping replicas in lock-step without involving additional machines or any asynchronous parameter servers. Which tf.distribute strategy should you choose?
tf.distribute.MirroredStrategy is designed for synchronous training on one machine that has multiple GPUs. Each replica processes a shard of the batch, gradients are reduced with NCCL-backed all-reduce ops, and variables stay synchronized automatically; using it typically only requires placing model creation and compile() inside strategy.scope()-no other cluster setup is necessary.
MultiWorkerMirroredStrategy also provides synchronous training but adds the overhead of configuring multiple workers, which is unnecessary in a single-node scenario.
ParameterServerStrategy is primarily for asynchronous (or hybrid) training across workers and parameter-server nodes, so it does not match the synchronous, single-machine requirement.
CentralStorageStrategy keeps all variables on one device (often the CPU) and mirrors computation to GPUs, which can become a performance bottleneck compared with MirroredStrategy when multiple GPUs are available. Therefore, MirroredStrategy is the best fit.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is `tf.distribute.MirroredStrategy`?
Open an interactive chat with Bash
How does `tf.distribute.MirroredStrategy` differ from `tf.distribute.MultiWorkerMirroredStrategy`?
Open an interactive chat with Bash
What are the advantages of using `tf.distribute.MirroredStrategy` over `CentralStorageStrategy` in multi-GPU setups?