GCP Professional Data Engineer Practice Question

Your company has three regional Google Cloud projects where raw log and ad-impression CSVs land hourly into Cloud Storage buckets that must remain the primary data-lake. The central analytics team needs one searchable catalog across all files, including automatic schema discovery and profiling, and wants to avoid ongoing engineering when new buckets or folder paths appear. Which architecture meets these goals while following Google Cloud's data governance best practices?

  • Use hourly Dataflow jobs to load all incoming files into a single multi-regional BigQuery dataset and let analysts search the dataset through Data Catalog.

  • Enable Cloud Asset Inventory exports for each project, write the bucket metadata to BigQuery, and expose a custom Looker dashboard for analysts to locate files.

  • Mount each bucket on a GKE cluster via Cloud Storage FUSE and run an open-source metadata crawler nightly to populate a self-hosted catalog service.

  • Create a Dataplex lake spanning the three projects, register every bucket as a managed asset in a raw zone with auto-discovery enabled, and grant analysts access to the resulting catalog entries.

GCP Professional Data Engineer
Designing data processing systems
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot