Your organization operates a 30-node Dataproc cluster that keeps 300 TB of data in HDFS on balanced persistent disks. The cluster is idle more than 16 hours a day, yet you must preserve all historical data for future reprocessing and analytics. Leadership asks you to cut monthly storage costs without increasing operational complexity or sacrificing data durability. Which change will best achieve this goal?
Schedule a daily cron job to stop the cluster during idle hours while leaving the existing HDFS data on attached persistent disks.
Keep the persistent cluster but lower the HDFS replication factor from 2 to 1 and replace balanced persistent disks with smaller local SSDs for shuffle storage.
Retain the current persistent cluster and move the data to a Filestore instance mounted on all workers to free up HDFS space.
Migrate the HDFS data to a regional Cloud Storage bucket and recreate the Dataproc cluster with minimal or no attached persistent disks, configuring the cluster to use the bucket as its primary Hadoop-compatible file system.
Storing long-lived data sets in HDFS keeps dozens of persistent disks attached to always-running VMs, so you pay for block-storage capacity even when the cluster is idle. Dataproc can treat Cloud Storage as its Hadoop-compatible file system (HCFS). By moving the 300 TB data lake to a Cloud Storage bucket and deleting most worker data disks, you pay much lower object-storage rates and avoid paying for idle persistent disks while retaining eleven-9s durability. Lowering the HDFS replication factor or switching to local SSDs might save some cost but still requires disks on VMs that run (and bill) 24×7. Filestore provides POSIX semantics but is significantly more expensive per GiB than Cloud Storage, so it will not minimize costs for large analytical data sets.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is Cloud Storage preferred over HDFS for this scenario?
Open an interactive chat with Bash
What does eleven-9s durability mean?
Open an interactive chat with Bash
How does Dataproc integrate with Cloud Storage as HCFS?
Open an interactive chat with Bash
What is HDFS and why is it used in Dataproc?
Open an interactive chat with Bash
What are the benefits of using Cloud Storage as an HCFS with Dataproc?
Open an interactive chat with Bash
How does reducing the HDFS replication factor affect durability and cost?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .