GCP Professional Data Engineer Practice Question

Your analytics team executes four large Spark batch ETL jobs every night, each with different libraries and executor memory requirements. During business hours, data scientists occasionally run short interactive Hive queries that must return within minutes. You want to minimize Dataproc costs without sacrificing performance or isolating the nightly jobs from one another. Which strategy best meets these goals?

  • Submit each nightly batch job to its own ephemeral Dataproc cluster and delete the cluster on completion; maintain a small persistent cluster for interactive queries.

  • Run both the batch jobs and interactive queries on a single persistent Dataproc cluster with autoscaling disabled to avoid provisioning delays.

  • Provision a separate always-on persistent Dataproc cluster for each nightly batch job to guarantee resource isolation, and shut them down in the morning.

  • Keep an always-on persistent cluster sized for the nightly batch peak, and launch short-lived job-based clusters only for interactive queries.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot