Your analytics team stores historical customer data in BigQuery and wants to build a churn-prediction logistic-regression model with BigQuery ML. Several continuous columns contain occasional NULLs and strong outliers, while the string column plan_type has five distinct values. The team insists that all preprocessing logic be declared inside the model definition so it is version-controlled together with the model and reruns automatically every time the model is retrained. Which approach best meets these requirements while minimizing operational overhead?
Create the model with a TRANSFORM clause that applies ML.IMPUTER, ML.ROBUST_SCALER, and ML.ONE_HOT_ENCODER to the raw columns inside the CREATE MODEL … OPTIONS(model_type = 'logistic_reg') statement.
Run a Cloud Dataflow pipeline to write a cleaned staging table, then point a CREATE MODEL statement (without TRANSFORM) at the staged data.
Issue CREATE MODEL without any preprocessing and rely on BigQuery ML's automatic handling of missing values and scaling.
Build a materialized view that encodes and scales the features, refresh it automatically, and train the logistic-regression model against the view.
BigQuery ML's TRANSFORM clause lets you declare feature-engineering steps inside the CREATE MODEL statement. Functions such as ML.IMPUTER can fill missing numeric values, ML.ROBUST_SCALER can scale outlier-prone numerical features, and ML.ONE_HOT_ENCODER can convert categorical strings like plan_type into sparse binary vectors. Because the transformations are evaluated every time the model is trained or retrained, the preprocessing logic stays co-located with, and versioned alongside, the model definition without requiring separate pipelines or manually maintained staging tables.
Using an external Dataflow job or a materialized view would work technically, but the transformation code would live outside the model and need separate orchestration. Relying on BigQuery ML's default behavior would not apply robust scaling or median imputation, so data quality issues would remain. Therefore, embedding the preprocessing functions in the TRANSFORM clause is the most efficient and maintainable solution.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of the TRANSFORM clause in BigQuery ML?
Open an interactive chat with Bash
What does ML.ROBUST_SCALER do in BigQuery ML?
Open an interactive chat with Bash
How does ML.ONE_HOT_ENCODER work in BigQuery ML?
Open an interactive chat with Bash
What is the purpose of the TRANSFORM clause in BigQuery ML?
Open an interactive chat with Bash
How does ML.ROBUST_SCALER handle outliers during preprocessing?
Open an interactive chat with Bash
What does ML.ONE_HOT_ENCODER do with categorical data like plan_type?
Open an interactive chat with Bash
GCP Professional Data Engineer
Preparing and using data for analysis
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .