GCP Professional Data Engineer Practice Question

A media company streams thousands of image URIs from on-premises cameras to Pub/Sub. A Dataflow streaming pipeline must add the output of an existing custom TensorFlow model that is already deployed as a Vertex AI online prediction endpoint in us-central1. Business requirements state: (1) each image must be enriched with model predictions within 500 ms even during peaks of 200 images per second, (2) all data and inference traffic must stay on Google's private network without public internet egress, and (3) per-prediction cost must be minimized. Which design for the Dataflow transformation best meets these requirements?

Package the TensorFlow SavedModel with the Dataflow worker container and run in-process inference on autoscaled GPU workers, avoiding any calls to Vertex AI.
Window incoming events for one minute, write the images to Cloud Storage, launch a Vertex AI batch prediction job, and join the asynchronous results back into the stream.
Add a ParDo that sends each image to the public Vertex AI online prediction REST endpoint over the internet from the Dataflow workers.
Expose the Vertex AI online prediction endpoint through Private Service Connect, disable public IPs on Dataflow workers, and invoke the private endpoint from a ParDo so the pipeline receives low-latency, in-region predictions that scale with demand.

Report Issue

Answer Description

Invoking the existing Vertex AI online prediction endpoint through Private Service Connect (PSC) keeps all traffic on Google's private network, satisfying the no-egress compliance requirement. Online prediction is designed for real-time, low-latency inference (tens-hundreds of milliseconds) and can automatically scale the number of model server replicas to meet high, bursty throughput such as 200 images per second. Because you pay only for the CPU/GPU time actually used plus minimal PSC charges, per-prediction cost is lower than embedding a model on every Dataflow worker or invoking a higher-priced public API. Calling the public endpoint would violate the no-egress requirement, and batch prediction jobs are asynchronous and add minutes of latency. Packaging the model inside every Dataflow worker inflates resource usage and cost and complicates model version management. Therefore, using PSC with Vertex AI online prediction from a ParDo in the Dataflow pipeline is the correct approach.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.