GCP Professional Data Engineer Practice Question

Your team ingests 15 TB of compressed application logs into Cloud Storage every night and immediately loads the data into a BigQuery staging table. A batch Dataflow pipeline then executes a series of SQL‐like joins, filters, and aggregations before writing the daily results into BigQuery reporting tables. The Dataflow job's worker and shuffle costs have grown significantly, and the team wants to reduce operational overhead while keeping the transformation logic in ANSI-compatible SQL under version control. What should you recommend?

  • Re-implement the pipeline with Dataflow SQL templates and trigger them nightly with Cloud Scheduler.

  • Move the transformation logic into BigQuery by creating version-controlled SQL files managed with Dataform or scheduled queries, and drop the Dataflow job.

  • Keep the Dataflow pipeline but orchestrate it with Cloud Data Fusion to simplify management.

  • Replace the Dataflow job with a Dataproc cluster that runs Spark SQL notebooks scheduled by Cloud Composer.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot