GCP Professional Data Engineer Practice Question

A multinational retailer wants to modernize its on-premises analytics stack by moving to Google Cloud. Requirements are:

Land raw click-stream and IoT data (JSON, images) with virtually unlimited scale.
Provide sub-minute dashboards fed by a streaming pipeline.
Run monthly company-wide SQL analytics without managing infrastructure.
Enforce centralized security, data quality, and lineage controls, while letting regional business units own their datasets. Which high-level design best satisfies these goals with minimal operational overhead?

Land raw data in Cloud Storage, register it in Dataplex raw zones, process streams with Dataflow into curated BigQuery tables that are also governed by Dataplex.
Stream JSON payloads straight into a single BigQuery dataset, store images as BASE64 strings, and manage access with manual dataset-level ACLs across regions.
Ingest all data directly into a global Spanner database to serve both real-time dashboards and analytical SQL queries, enforcing governance through IAM on Spanner tables.
Write raw events into Bigtable, schedule Cloud Composer DAGs to copy data into BigQuery, and use Data Catalog in each project for discovery and policy control.

Report Issue

Answer Description

Cloud Storage gives virtually unlimited, inexpensive object storage for raw semi-structured and unstructured data. Dataplex builds a logical lake and zones on top of Cloud Storage and BigQuery, auto-catalogs assets, applies fine-grained IAM and data quality rules, and supports a federated "data-as-a-product" model without forcing all data into a single project. Dataflow's serverless streaming runners can transform and continuously load data into BigQuery, whose serverless architecture supports interactive SQL analytics at scale without cluster management. Alternatives fall short: storing raw data in Bigtable or Spanner is ill-suited to images and large, variable-schema files; Cloud Composer copy jobs introduce higher latency and maintenance; relying on per-project Data Catalog entries or manual BigQuery ACLs does not meet the centralized, cross-domain governance requirement.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.