GCP Professional Data Engineer Practice Question

Your company stores all raw and curated data for its enterprise-wide data lake in regional Cloud Storage buckets. A 40-minute Spark job must convert the previous day's 3 TB of application logs from JSON to partitioned Parquet each night. Leadership wants to pay for compute only while the transformation runs and to delete all cluster resources immediately afterward, without risking data loss or an extra data-copy step. Which design satisfies these requirements?

Attach local SSDs to each worker, copy the logs to the SSDs, perform the Spark conversion, and rely on VM snapshots to preserve the Parquet files when the cluster shuts down.
Launch a Dataproc cluster on demand, run the Spark job with input and output paths set to gs:// buckets, and configure the cluster to auto-delete immediately after the job completes.
Load the JSON logs into the cluster's HDFS, run the Spark conversion there, then copy the Parquet files back to Cloud Storage before manually deleting the cluster.
Create a long-running Dataproc cluster that persists the logs and Parquet output in Bigtable tables mounted on the cluster; shut down only the worker VMs overnight.

GCP Professional Data Engineer

Storing the data

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is Dataproc in GCP?

What is gs:// in Cloud Storage?

What is Parquet and why is it preferred for data processing?

What is a Dataproc cluster?

Why use Cloud Storage instead of HDFS or local SSDs?

What is partitioned Parquet, and why is it used?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is Dataproc in GCP?

What is gs:// in Cloud Storage?

What is Parquet and why is it preferred for data processing?

What is a Dataproc cluster?

Why use Cloud Storage instead of HDFS or local SSDs?

What is partitioned Parquet, and why is it used?

Report Issue