A retail company ingests raw clickstream files into a Cloud Storage zone in Dataplex and transforms them into partitioned BigQuery tables for reporting. The compliance team has two new requirements:
Automatically maintain end-to-end lineage between the raw objects and the curated BigQuery tables so auditors can trace every record's origin.
Trigger an on-call alert whenever more than 2 % of rows violate data-quality rules that verify primary-key uniqueness and required column presence in the curated tables. You must meet both requirements while minimizing custom code and operational toil. What should you do?
Schedule Cloud DLP inspection jobs on the BigQuery tables, send findings to Pub/Sub, and configure an alerting policy on the Pub/Sub topic; document lineage by manually adding Data Catalog tags.
Export BigQuery and Cloud Storage audit logs to BigQuery, join them with Cloud Asset Inventory feeds to infer lineage, and query the logs with scheduled queries that raise alerts when rule violations exceed 2 %.
In Dataplex, create a Data Quality scan on the curated BigQuery asset with the required rules, configure a Cloud Monitoring alert on the failed_rows_ratio metric that the scan exports, and enable the Data Lineage API to let Dataplex capture lineage automatically.
Deploy Apache Atlas on a long-running Dataproc cluster for lineage tracking, and build a Dataflow pipeline that computes data-quality statistics and writes custom metrics to Cloud Monitoring.
Dataplex natively provides both capabilities that are needed without additional custom services. Create a Data Quality scan on the curated BigQuery asset and add rules that check for primary-key uniqueness and non-null columns. The scan automatically exports the metric dataplex.googleapis.com/scan/data_quality/failed_rows_ratio to Cloud Monitoring, where an alerting policy can fire when the value exceeds 0.02. For lineage, enable the Data Lineage API in the same project; Dataplex automatically captures lineage between the Cloud Storage asset and the BigQuery tables whenever BigQuery jobs or Dataproc/Dataflow pipelines move the data, satisfying the audit requirement. The other options rely on manual tagging, custom pipelines, or third-party tooling, which increases operational overhead.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Dataplex and how does it facilitate data lineage?
Open an interactive chat with Bash
How does the Data Quality scan in Dataplex work?
Open an interactive chat with Bash
What is the significance of the failed_rows_ratio metric in Cloud Monitoring?
Open an interactive chat with Bash
What is Dataplex and how does it support data lineage?
Open an interactive chat with Bash
What is the 'failed_rows_ratio' metric in Dataplex Data Quality scans?
Open an interactive chat with Bash
How does the Cloud Monitoring alert integrate with Dataplex for managing violations?
Open an interactive chat with Bash
GCP Professional Data Engineer
Storing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .