GCP Professional Data Engineer Practice Question

Your team operates a 20-node persistent Dataproc cluster, each worker provisioned with 2 TB of persistent disk. A nightly Spark batch job ingests about 1 TB of new data, writes aggregated results, and then the cluster sits idle for roughly 20 hours. Analysts still need access to both the raw and aggregated data until the next run. Which redesign will most effectively reduce storage costs while preserving data availability between jobs?

Retain the persistent cluster and add preemptible secondary workers so nodes can be released during idle periods while HDFS replicas stay on primary workers.
Migrate the workload to BigQuery for storage and querying, but retain the Dataproc cluster for transformations that BigQuery cannot perform.
Switch to a job-scoped (ephemeral) Dataproc cluster configured to use a regional Cloud Storage bucket as the default file system, uploading input and output data to the bucket and deleting the cluster after each run.
Shrink worker persistent disks to 100 GB and add local SSDs for shuffle spill, keeping the cluster running so HDFS holds data for analysts.

GCP Professional Data Engineer

Maintaining and automating data workloads

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

Why is Cloud Storage preferred over HDFS for ephemeral clusters?

What is the benefit of using a regional bucket in Cloud Storage for Dataproc jobs?

How does an ephemeral Dataproc cluster reduce costs compared to persistent clusters?

What is an ephemeral Dataproc cluster?

What is the role of regional Cloud Storage in Dataproc workflows?

Why is HDFS not ideal for this use case?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

Why is Cloud Storage preferred over HDFS for ephemeral clusters?

What is the benefit of using a regional bucket in Cloud Storage for Dataproc jobs?

How does an ephemeral Dataproc cluster reduce costs compared to persistent clusters?

What is an ephemeral Dataproc cluster?

What is the role of regional Cloud Storage in Dataproc workflows?

Why is HDFS not ideal for this use case?

Report Issue