GCP Professional Data Engineer Practice Question

Your team runs a Python Dataflow streaming pipeline that ingests 20 000 JSON events per second from Pub/Sub and writes enriched rows to BigQuery. Each event must be classified by an existing Vertex AI model, and the end-to-end latency budget is 200 ms even during peak load. The solution must stay fully serverless and keep prediction cost as low as possible. How should you integrate the inference step into the pipeline?

Write events to Cloud Storage and launch a Vertex AI batch prediction job every minute, then read the output back into the streaming pipeline.
Use a GroupIntoBatches transform to assemble small bundles of events and send each bundle as a single gRPC request to the Vertex AI online prediction endpoint from the Dataflow worker.
Stream events directly into BigQuery and execute a scheduled BigQuery ML remote-model query every 30 seconds to populate the classification column.
Invoke the Vertex AI online prediction endpoint synchronously for each individual event inside a MapElements transform.

GCP Professional Data Engineer

Ingesting and processing the data

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is the purpose of the GroupIntoBatches transform in Dataflow?

What is gRPC, and why is it used in serverless pipelines like this?

How does Vertex AI charge for online predictions, and why is batching beneficial?

What is Dataflow and how does it integrate with Pub/Sub?

How does GroupIntoBatches transform help reduce latency?

What are gRPC calls and why are they used in this pipeline?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is the purpose of the GroupIntoBatches transform in Dataflow?

What is gRPC, and why is it used in serverless pipelines like this?

How does Vertex AI charge for online predictions, and why is batching beneficial?

What is Dataflow and how does it integrate with Pub/Sub?

How does GroupIntoBatches transform help reduce latency?

What are gRPC calls and why are they used in this pipeline?

Report Issue