🔥 40% Off Crucial Exams Memberships — Deal ends today!

44 minutes, 53 seconds remaining!

GCP Professional Data Engineer Practice Question

A retail analytics team runs a Spark‐based ETL workflow each night that processes 8 TB of sales logs stored in Cloud Storage. The job finishes in about four hours and there is no other Spark workload during the day. Security policy requires that each run uses a fresh environment so that no leftover libraries or temp files persist after completion. The team's main goal is to eliminate the 20 hours of daily idle costs from their current always-on 20-node Dataproc cluster while still giving every nightly run the exact Spark configuration it needs. Which approach best meets these requirements?

  • Use a Dataproc Workflow Template that creates a job-scoped (ephemeral) cluster, runs the Spark ETL job with Cloud Storage as the default file system, and deletes the cluster automatically after the workflow succeeds or fails.

  • Rewrite the nightly pipeline as scheduled queries in BigQuery and drop Dataproc altogether.

  • Convert the Spark job to a streaming Dataflow pipeline launched from a Flex Template, allowing Dataflow to scale workers down after processing completes.

  • Keep the existing cluster but enable autoscaling with preemptible secondary workers to shrink the cluster to zero workers when idle.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot