GCP Professional Data Engineer Practice Question

Your organization operates a 30-node Dataproc cluster that keeps 300 TB of data in HDFS on balanced persistent disks. The cluster is idle more than 16 hours a day, yet you must preserve all historical data for future reprocessing and analytics. Leadership asks you to cut monthly storage costs without increasing operational complexity or sacrificing data durability. Which change will best achieve this goal?

  • Migrate the HDFS data to a regional Cloud Storage bucket and recreate the Dataproc cluster with minimal or no attached persistent disks, configuring the cluster to use the bucket as its primary Hadoop-compatible file system.

  • Keep the persistent cluster but lower the HDFS replication factor from 2 to 1 and replace balanced persistent disks with smaller local SSDs for shuffle storage.

  • Schedule a daily cron job to stop the cluster during idle hours while leaving the existing HDFS data on attached persistent disks.

  • Retain the current persistent cluster and move the data to a Filestore instance mounted on all workers to free up HDFS space.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot