🔥 40% Off Crucial Exams Memberships — Deal ends today!

44 minutes, 43 seconds remaining!

GCP Professional Data Engineer Practice Question

Your company processes 12 TB of IoT sensor data every night in an on-prem Hadoop cluster using PySpark jobs. They must move to Google Cloud but keep the existing Spark code, avoid vendor lock-in so they can repatriate workloads later, and prefer open-source orchestration. Which Google Cloud-based design best meets these portability requirements while adding the ability to scale on demand?

  • Rewrite the pipelines in Apache Beam and execute them on Dataflow, scheduling executions with Workflows.

  • Use autoscaling Dataproc Spark clusters that read and write Parquet files in Cloud Storage, orchestrated end-to-end with Cloud Composer DAGs.

  • Containerize each Spark job and deploy on Cloud Run, triggering executions via Pub/Sub and coordinating with Cloud Scheduler.

  • Load historical data into BigQuery, stream new data with the BigQuery Streaming API, and schedule nightly SQL transformations with Dataform.

GCP Professional Data Engineer
Designing data processing systems
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot