You are building a BigQuery ML logistic-regression model on table prod.customers, which contains nullable numeric columns (usage_minutes, tenure_days) and a high-cardinality STRING column plan_type. Analysts will later call ML.PREDICT directly on the raw table from BI dashboards. You need to guarantee that missing numeric values are mean-imputed and that plan_type is one-hot encoded during both model training and every subsequent prediction, without requiring any additional preprocessing SQL in the dashboards. What should you do?
Apply only numeric normalization in the TRANSFORM clause and instruct dashboard developers to one-hot encode plan_type within their ML.PREDICT queries.
Create a materialized view that performs the imputing and one-hot encoding, train the model on that view, and require dashboards to invoke ML.PREDICT against the view instead of the raw table.
Run a scheduled Dataflow pipeline that writes a fully preprocessed feature table; instruct dashboards to join to this table before calling ML.PREDICT so that the model receives clean features.
Specify a TRANSFORM clause when you CREATE MODEL, using ML.IMPUTER for the numeric columns and ML.ONE_HOT_ENCODER for plan_type; BigQuery ML will reuse these transformations automatically during ML.PREDICT.
Define the preprocessing inside the TRANSFORM clause of the CREATE MODEL statement. By calling ML.IMPUTER on usage_minutes and tenure_days you ensure mean imputation for missing numeric values, and applying ML.ONE_HOT_ENCODER to plan_type converts the high-cardinality string column into sparse indicator features, optionally limited by top_k. BigQuery ML stores these transformations with the model and automatically reapplies them during ML.PREDICT, so dashboard queries can run predictions on raw records without repeating the logic. External pipelines, materialized views, or delegating encoding to dashboards would require manual coordination and risk feature skew, whereas BigQuery ML does not offer an auto_preprocess flag in CREATE MODEL.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the `TRANSFORM` clause in BigQuery ML?
Open an interactive chat with Bash
What does `ML.IMPUTER` do in BigQuery ML?
Open an interactive chat with Bash
How does `ML.ONE_HOT_ENCODER` work for categorical variables?
Open an interactive chat with Bash
What does ML.IMPUTER do in BigQuery ML?
Open an interactive chat with Bash
How does ML.ONE_HOT_ENCODER work for categorical columns?
Open an interactive chat with Bash
Why is it risky to preprocess data outside the TRANSFORM clause in BigQuery ML?
Open an interactive chat with Bash
GCP Professional Data Engineer
Preparing and using data for analysis
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .