Your health-insurance company ingests millions of call-center transcripts from Cloud Storage into BigQuery each day for trend analysis. Regulations forbid storing clear-text PII such as customer names and phone numbers, yet analysts must be able to deterministically group conversations that belong to the same customer for audits. You want a scalable, fully managed solution that requires minimal custom code and lets you add new PII detectors later. Which design should you implement?
Enable BigQuery column-level security on the PII columns and grant access only to authorized roles while keeping the original transcripts unchanged in BigQuery.
Load the raw transcripts into BigQuery first and use SQL REGEXP_REPLACE functions in scheduled queries to overwrite PII columns with randomly generated strings.
Invoke the Cloud DLP Files on Cloud Storage to BigQuery Dataflow template with a de-identification configuration that applies CryptoDeterministicConfig using a customer-managed Cloud KMS key, producing tokenized names and phone numbers before loading the data into BigQuery.
Encrypt each transcript locally with a customer-supplied encryption key (CSEK) and load the encrypted files directly into BigQuery so analysts can decrypt data when needed.
Cloud Data Loss Prevention (DLP) offers fully managed inspection and de-identification capabilities that scale automatically. By calling DLP from the "Cloud Storage Text to BigQuery with Cloud DLP" Dataflow template and configuring a CryptoDeterministicConfig that uses a customer-managed Cloud KMS key, all detected PII (for example, PERSON_NAME and PHONE_NUMBER infoTypes) is replaced with stable, non-reversible tokens before the data is written to BigQuery. Because the same source value always maps to the same token when the same key is used, analysts can reliably join or aggregate records that refer to the same individual without exposing the original sensitive values. BigQuery's column-level security or views alone would leave the raw PII in storage, and client-side encryption or ad-hoc regex masking would break deterministic linking and add significant custom development and maintenance. Therefore, orchestrating a Dataflow DLP de-identification template with deterministic cryptographic tokenization best satisfies the privacy, scalability, and maintainability requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud DLP in GCP?
Open an interactive chat with Bash
What is CryptoDeterministicConfig in Cloud DLP?
Open an interactive chat with Bash
Why use Cloud KMS with Cloud DLP?
Open an interactive chat with Bash
What is Cloud DLP and how does it help with data de-identification?
Open an interactive chat with Bash
How does CryptoDeterministicConfig enable deterministic tokenization?
Open an interactive chat with Bash
Why is a customer-managed Cloud KMS key important in this design?
Open an interactive chat with Bash
GCP Professional Data Engineer
Designing data processing systems
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .