GCP Professional Data Engineer Practice Question

Your organization runs a high-throughput Apache Kafka cluster in an on-premises data center. Security policy forbids any outbound connections from the data center; only clients in Google Cloud may initiate TLS-encrypted, private (VPN/Interconnect) connections to on-prem resources. You must build a near-real-time pipeline that ingests the Kafka messages into BigQuery with minimal additional operational overhead. Which approach should you choose for the streaming source of the pipeline?

Configure Kafka Connect to batch-export topic data to Cloud Storage every minute, then run a scheduled Dataflow batch job to load the files into BigQuery.
Refactor the on-prem producers to send events directly to Cloud Pub/Sub over the internet and have Dataflow read from the Pub/Sub subscription.
Run a serverless Dataflow streaming job that uses the Apache Kafka I/O connector to consume the on-prem Kafka topic over the private VPN and stream the data into BigQuery.
Deploy Pub/Sub Lite on-premises with Pub/Sub Edge, publish the Kafka messages into the new topic, and have Dataflow read from Pub/Sub Lite.

Report Issue

Answer Description

Because only pull-based access from Google Cloud to the on-premises environment is permitted, the streaming system in Cloud must be able to act as a Kafka consumer that connects to the existing brokers across the private network. A Dataflow job that uses the built-in Apache Kafka I/O connector fulfills this need: Dataflow workers establish outbound connections to the specified bootstrap servers, continuously consume records, and forward them for transformation or loading into BigQuery. This avoids introducing or managing a new message broker, meets the low-latency requirement, and keeps operational overhead low because Dataflow is fully managed.

The other options either violate the security constraint (publishing from on-prem to Pub/Sub), introduce an additional broker that must be managed (running Pub/Sub Lite on-prem), or convert the workload to periodic batch files, increasing latency and complexity (exporting to Cloud Storage first).

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.