Your organization currently ingests 100 TB of compressed Avro point-of-sale events each month into BigQuery. A new initiative will also capture store-camera video and clickstream JSON. The solution must: preserve all raw files for future reprocessing, minimize storage cost as data volume grows, let analysts query curated datasets with BigQuery without copying the data, and apply centralized governance and data discovery across raw and processed zones. Which design best meets these requirements?
Create separate BigQuery datasets for raw Avro records and for processed tables; upload video files as BYTES columns inside BigQuery tables and manage access through dataset-level IAM.
Ingest Avro logs, JSON, and video metadata directly into Cloud Bigtable column families, then schedule BigQuery Data Transfer Service jobs to copy hourly snapshots into BigQuery for reporting.
Land all incoming files in Cloud Storage buckets partitioned as raw and curated zones, register them with Dataplex, transform raw Avro/JSON to Parquet in a curated bucket using Dataflow, and expose the curated folders to analysts through BigLake tables in BigQuery.
Mount a large Filestore instance as HDFS for Dataproc to store raw files; write transformed outputs into Cloud SQL instances for analysts and secure resources with VPC Service Controls.
Staging all raw files in Cloud Storage provides low-cost, highly durable, and virtually unlimited object storage for video, Avro, and JSON. Maintaining separate raw and curated buckets allows the originals to be retained while Dataflow converts Avro and JSON to Parquet for efficient analytics. Registering the buckets in Dataplex delivers a unified catalog and centralized security controls. Exposing the curated Parquet objects through BigLake tables lets BigQuery query the data in place, avoiding duplication. The alternative designs either store large binaries in systems not optimized for economical object retention, force additional data copies into BigQuery, or lack a governance layer that spans raw and processed zones.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the role of Dataplex in this solution?
Open an interactive chat with Bash
Why is Parquet chosen as the format for the curated data?
Open an interactive chat with Bash
What are BigLake tables and how do they help avoid data duplication?
Open an interactive chat with Bash
What is Dataplex and how does it enable centralized governance?
Open an interactive chat with Bash
Why is Parquet chosen for storing curated datasets over Avro or JSON?
Open an interactive chat with Bash
What are BigLake tables, and how do they integrate with BigQuery?
Open an interactive chat with Bash
GCP Professional Data Engineer
Storing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .