GCP Professional Data Engineer Practice Question

A retailer runs a 50-node on-prem Hadoop cluster that supports ad-hoc Hive queries for data analysts during the day and Spark ETL jobs overnight. Management wants a fast lift-and-shift to Google Cloud that keeps the existing Hive metastore and avoids rewriting job submission scripts. They accept some idle cost in the short term but want the option to manually scale down workers during quiet periods. Which Dataproc deployment approach best satisfies these requirements while modernization is planned?

Rewrite the Hive queries as BigQuery SQL and migrate all data directly into BigQuery to eliminate cluster management entirely.
Create a single persistent Dataproc cluster sized similarly to the on-prem environment, store data in Cloud Storage, and manually resize the cluster when workload volumes change.
Use Dataproc Serverless for Spark, submitting every batch and interactive job through the serverless API instead of managing clusters.
Refactor each Spark and Hive workload to launch its own ephemeral Dataproc cluster that is created at job start and deleted on completion.

GCP Professional Data Engineer

Maintaining and automating data workloads

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is Dataproc and how does it work on Google Cloud?

How does Cloud Storage integrate with Dataproc for big data processing?

What is the role of the Hive metastore in Hadoop or Dataproc environments?

What is a Dataproc cluster?

What is the Hive metastore and why is it important in this scenario?

What is the benefit of using Cloud Storage in Dataproc deployments?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is Dataproc and how does it work on Google Cloud?

How does Cloud Storage integrate with Dataproc for big data processing?

What is the role of the Hive metastore in Hadoop or Dataproc environments?

What is a Dataproc cluster?

What is the Hive metastore and why is it important in this scenario?

What is the benefit of using Cloud Storage in Dataproc deployments?

Report Issue