🔥 40% Off Crucial Exams Memberships — Deal ends today!

1 hour, 26 minutes remaining!

GCP Professional Data Engineer Practice Question

A retailer is migrating its on-premises Hadoop environment to Google Cloud. The data engineering team will run Spark Structured Streaming jobs that ingest inventory events 24×7 and must keep end-to-end latency under one minute. During the day, analysts connect with Hive to perform interactive, ad-hoc queries against the same datasets. The team needs the flexibility to install custom Hadoop libraries and is comfortable paying for always-on capacity to avoid startup delays. Which Dataproc deployment model best satisfies these requirements?

  • Maintain a single persistent Dataproc cluster that runs the streaming Spark jobs continuously and allows analysts to submit interactive Hive queries on demand.

  • Store data in BigQuery and replace Spark streaming with scheduled BigQuery queries, eliminating the need for any Dataproc cluster.

  • Schedule Cloud Composer to launch Dataproc Serverless Spark batches for both streaming ingestion and analyst queries.

  • Use a Dataproc workflow template that creates an ephemeral cluster for each Spark streaming job and deletes it when the job finishes.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot