GCP Professional Data Engineer Practice Question

Your organization receives 4 TB of JSON telemetry each day from hundreds of thousands of IoT devices. The events must be filtered for malformed records, deduplicated by device-timestamp, enriched with a 100 MB reference table that updates weekly, and streamed into partitioned BigQuery tables with sub-minute latency. Data engineers also need to rerun the identical logic over months of archived data for occasional backfills. Operations teams require automatic horizontal scaling and no cluster management. Which Google Cloud solution best satisfies all requirements?

Load raw events directly into BigQuery with streaming inserts and use Dataform SQL models to cleanse, deduplicate, and enrich data in place.
Implement an Apache Beam pipeline on Cloud Dataflow that streams events, uses a weekly-refreshed side input for enrichment, and can be re-run in batch for backfills.
Create a Cloud Data Fusion pipeline with Wrangler transforms, triggered by Cloud Composer, and manually scale the underlying Dataproc cluster for spikes.
Run Spark Streaming on a long-running Cloud Dataproc cluster, coupled with scheduled Dataprep jobs for cleaning and a separate batch Spark job for backfills.

GCP Professional Data Engineer

Designing data processing systems

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is Apache Beam and why is it suitable for this use case?

What are side inputs and how do they work in Cloud Dataflow?

How does Cloud Dataflow ensure automatic horizontal scaling?

Why is Apache Beam suitable for both streaming and batch processing in Cloud Dataflow?

What is a side input in Apache Beam, and how does it enable enrichment?

How does Cloud Dataflow ensure automatic horizontal scaling?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is Apache Beam and why is it suitable for this use case?

What are side inputs and how do they work in Cloud Dataflow?

How does Cloud Dataflow ensure automatic horizontal scaling?

Why is Apache Beam suitable for both streaming and batch processing in Cloud Dataflow?

What is a side input in Apache Beam, and how does it enable enrichment?

How does Cloud Dataflow ensure automatic horizontal scaling?

Report Issue