Your organization runs a high-throughput Apache Kafka cluster in an on-premises data center. Security policy forbids any outbound connections from the data center; only clients in Google Cloud may initiate TLS-encrypted, private (VPN/Interconnect) connections to on-prem resources. You must build a near-real-time pipeline that ingests the Kafka messages into BigQuery with minimal additional operational overhead. Which approach should you choose for the streaming source of the pipeline?
Run a serverless Dataflow streaming job that uses the Apache Kafka I/O connector to consume the on-prem Kafka topic over the private VPN and stream the data into BigQuery.
Refactor the on-prem producers to send events directly to Cloud Pub/Sub over the internet and have Dataflow read from the Pub/Sub subscription.
Deploy Pub/Sub Lite on-premises with Pub/Sub Edge, publish the Kafka messages into the new topic, and have Dataflow read from Pub/Sub Lite.
Configure Kafka Connect to batch-export topic data to Cloud Storage every minute, then run a scheduled Dataflow batch job to load the files into BigQuery.
Because only pull-based access from Google Cloud to the on-premises environment is permitted, the streaming system in Cloud must be able to act as a Kafka consumer that connects to the existing brokers across the private network. A Dataflow job that uses the built-in Apache Kafka I/O connector fulfills this need: Dataflow workers establish outbound connections to the specified bootstrap servers, continuously consume records, and forward them for transformation or loading into BigQuery. This avoids introducing or managing a new message broker, meets the low-latency requirement, and keeps operational overhead low because Dataflow is fully managed.
The other options either violate the security constraint (publishing from on-prem to Pub/Sub), introduce an additional broker that must be managed (running Pub/Sub Lite on-prem), or convert the workload to periodic batch files, increasing latency and complexity (exporting to Cloud Storage first).
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
How does Dataflow's Apache Kafka I/O connector work?
Open an interactive chat with Bash
Why was Pub/Sub Lite with Pub/Sub Edge not ideal for this scenario?
Open an interactive chat with Bash
What are the drawbacks of exporting Kafka data to Cloud Storage for batch processing?
Open an interactive chat with Bash
What is the Apache Kafka I/O connector in Dataflow?
Open an interactive chat with Bash
How does TLS encryption secure connections between Kafka and Dataflow?
Open an interactive chat with Bash
Why is Dataflow suitable for a high-throughput pipeline to ingest messages into BigQuery?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .