After migrating a 40-TB Oracle data mart to BigQuery using Datastream (CDC -> Cloud Storage) and Dataflow loads, you must prove before cut-over that every source row matches its BigQuery copy. The solution has to 1) scale across hundreds of tables without per-table coding, 2) surface any row-level mismatches, and 3) expose results to Cloud Monitoring for alerts. Which approach best meets these requirements?
Develop individual Dataflow pipelines for each table that calculate row hashes in Oracle and BigQuery, then compare the results and publish a metric.
Enable a built-in Datastream data-validation feature to generate checksum comparisons automatically and send the results to Cloud Logging.
Create final BigQuery snapshots and run manual EXCEPT queries against exported Oracle CSV files; record any differences in a spreadsheet.
Run Google's open-source Data Validation Tool as a Dataflow flex template to compute per-table checksums between Oracle and BigQuery, log results to Cloud Logging, and create log-based metrics for Cloud Monitoring alerts.
Running Google's open-source Data Validation Tool (DVT) as a Dataflow flex template fulfills all constraints. The template compares row counts and column-level checksums between Oracle and BigQuery for every table based on a single YAML configuration, so no table-specific code is required. It writes detailed match and mismatch results to BigQuery tables and Cloud Logging, from which log-based metrics can feed Cloud Monitoring alert policies. Datastream lacks built-in validation, custom Dataflow jobs would require per-table logic, BigQuery snapshots with manual SQL diffing do not scale, and BigQuery Data Transfer Service offers no validation capability. Therefore, deploying the DVT Dataflow template is the most efficient, scalable, and monitorable option.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Google's open-source Data Validation Tool (DVT) and how does it work?
Open an interactive chat with Bash
What are Dataflow flex templates and how do they enhance this solution?
Open an interactive chat with Bash
How does Cloud Monitoring integrate with log-based metrics for alerts in this solution?
Open an interactive chat with Bash
What is the Data Validation Tool (DVT)?
Open an interactive chat with Bash
What is a Dataflow flex template?
Open an interactive chat with Bash
How can log-based metrics enable Cloud Monitoring alerts?
Open an interactive chat with Bash
GCP Professional Data Engineer
Designing data processing systems
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .