🔥 40% Off Crucial Exams Memberships — Deal ends today!

2 hours, 59 minutes remaining!

GCP Professional Data Engineer Practice Question

Your organization receives 4 TB of JSON telemetry each day from hundreds of thousands of IoT devices. The events must be filtered for malformed records, deduplicated by device-timestamp, enriched with a 100 MB reference table that updates weekly, and streamed into partitioned BigQuery tables with sub-minute latency. Data engineers also need to rerun the identical logic over months of archived data for occasional backfills. Operations teams require automatic horizontal scaling and no cluster management. Which Google Cloud solution best satisfies all requirements?

  • Load raw events directly into BigQuery with streaming inserts and use Dataform SQL models to cleanse, deduplicate, and enrich data in place.

  • Implement an Apache Beam pipeline on Cloud Dataflow that streams events, uses a weekly-refreshed side input for enrichment, and can be re-run in batch for backfills.

  • Create a Cloud Data Fusion pipeline with Wrangler transforms, triggered by Cloud Composer, and manually scale the underlying Dataproc cluster for spikes.

  • Run Spark Streaming on a long-running Cloud Dataproc cluster, coupled with scheduled Dataprep jobs for cleaning and a separate batch Spark job for backfills.

GCP Professional Data Engineer
Designing data processing systems
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot