🔥 40% Off Crucial Exams Memberships — Deal ends today!

2 hours, 27 minutes remaining!

GCP Professional Data Engineer Practice Question

Your company runs a Spark-based ETL that processes about 4 TB of logs every night. The job takes 40 minutes and has a strict SLA but no interactivity requirements. During business hours, data scientists occasionally submit short interactive Spark SQL queries (5-10 minutes each) for troubleshooting. The current always-on 20-node Dataproc cluster sits idle most of the time. The CIO demands a 70 % cost reduction without hurting either workload. Which redesign best satisfies the goal?

  • Migrate the nightly ETL to an ephemeral Dataproc job cluster that terminates on completion, and keep a minimal two-node persistent cluster dedicated to the interactive Spark SQL queries.

  • Enable autoscaling and preemptible workers on the existing 20-node persistent cluster so it scales down when idle but remains available for both workloads.

  • Retire Dataproc and load the logs into BigQuery, running both the nightly ETL and interactive analysis as SQL queries with on-demand pricing.

  • Replace the single cluster with a new per-job Dataproc cluster for every workload, including the interactive queries, deleting each cluster immediately after the query finishes.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot