GCP Professional Data Engineer Practice Question

A retailer runs a 50-node on-prem Hadoop cluster that supports ad-hoc Hive queries for data analysts during the day and Spark ETL jobs overnight. Management wants a fast lift-and-shift to Google Cloud that keeps the existing Hive metastore and avoids rewriting job submission scripts. They accept some idle cost in the short term but want the option to manually scale down workers during quiet periods. Which Dataproc deployment approach best satisfies these requirements while modernization is planned?

  • Rewrite the Hive queries as BigQuery SQL and migrate all data directly into BigQuery to eliminate cluster management entirely.

  • Create a single persistent Dataproc cluster sized similarly to the on-prem environment, store data in Cloud Storage, and manually resize the cluster when workload volumes change.

  • Use Dataproc Serverless for Spark, submitting every batch and interactive job through the serverless API instead of managing clusters.

  • Refactor each Spark and Hive workload to launch its own ephemeral Dataproc cluster that is created at job start and deleted on completion.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot