GCP Professional Data Engineer Practice Question

Your data engineering team operates a 50-node Dataproc cluster that runs a nightly Spark ETL job on clickstream data for about two hours. The rest of the day the cluster is largely idle, except for occasional ad-hoc Hive queries from analysts who can wait a few minutes for results to start returning. Management asks you to lower compute costs while keeping existing SLAs. What approach should you take?

Retire Dataproc and rewrite the Spark pipelines as BigQuery SQL, using only on-demand query pricing.
Keep the existing persistent cluster but attach an autoscaling policy so all workers can scale down to zero when idle.
Run each ETL and ad-hoc workload on an ephemeral Dataproc cluster that is created when the job is submitted and deleted when it completes.
Keep the persistent cluster and convert all worker nodes to preemptible VMs while leaving master nodes unchanged.

GCP Professional Data Engineer

Maintaining and automating data workloads

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is an ephemeral Dataproc cluster?

How does autoscaling work on Dataproc clusters?

What are preemptible VMs, and why might they not be ideal for this use case?

What is an ephemeral Dataproc cluster?

How does Dataproc autoscaling work?

What are preemptible VMs on Dataproc?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is an ephemeral Dataproc cluster?

How does autoscaling work on Dataproc clusters?

What are preemptible VMs, and why might they not be ideal for this use case?

What is an ephemeral Dataproc cluster?

How does Dataproc autoscaling work?

What are preemptible VMs on Dataproc?

Report Issue