Your company has three regional Google Cloud projects where raw log and ad-impression CSVs land hourly into Cloud Storage buckets that must remain the primary data-lake. The central analytics team needs one searchable catalog across all files, including automatic schema discovery and profiling, and wants to avoid ongoing engineering when new buckets or folder paths appear. Which architecture meets these goals while following Google Cloud's data governance best practices?
Use hourly Dataflow jobs to load all incoming files into a single multi-regional BigQuery dataset and let analysts search the dataset through Data Catalog.
Enable Cloud Asset Inventory exports for each project, write the bucket metadata to BigQuery, and expose a custom Looker dashboard for analysts to locate files.
Mount each bucket on a GKE cluster via Cloud Storage FUSE and run an open-source metadata crawler nightly to populate a self-hosted catalog service.
Create a Dataplex lake spanning the three projects, register every bucket as a managed asset in a raw zone with auto-discovery enabled, and grant analysts access to the resulting catalog entries.
Dataplex is designed to build governed data lakes that can span multiple projects and regions. By creating a lake and adding each Cloud Storage bucket as a managed asset in a raw zone, Dataplex continuously crawls the buckets, detects new objects, infers file schemas, computes data profiles, and automatically publishes the resulting technical metadata into the unified Data Catalog. Analysts from any project can then search and access the data through the same catalog entries once appropriate IAM permissions are granted. The other options all require significant custom engineering or do not provide built-in schema inference and profiling. Running an open-source crawler on GKE increases operational overhead and does not integrate natively with Cloud IAM or Data Catalog. Moving all files into BigQuery with Dataflow breaks the requirement to keep Cloud Storage as the canonical landing zone and still needs ongoing pipeline maintenance for new buckets. Exporting Cloud Asset Inventory and creating Looker dashboards only captures storage object metadata; it lacks automated schema inference, data profiles, and tight catalog integration. Therefore, using Dataplex with auto-discovery and its integrated catalog is the most appropriate choice.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Dataplex and how does it enable governed data lakes?
Open an interactive chat with Bash
What is the role of auto-discovery in Dataplex, and how does it work?
Open an interactive chat with Bash
How does Dataplex integrate with Google Cloud IAM and Data Catalog?
Open an interactive chat with Bash
What is Dataplex and how does it work in Google Cloud?
Open an interactive chat with Bash
What are the advantages of schema discovery and profiling in Dataplex?
Open an interactive chat with Bash
How does Dataplex integrate with IAM and Data Catalog for analytics access?
Open an interactive chat with Bash
GCP Professional Data Engineer
Designing data processing systems
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .