🔥 40% Off Crucial Exams Memberships — Deal ends today!

10 minutes, 59 seconds remaining!

GCP Professional Data Engineer Practice Question

Your analytics team stores historical customer data in BigQuery and wants to build a churn-prediction logistic-regression model with BigQuery ML. Several continuous columns contain occasional NULLs and strong outliers, while the string column plan_type has five distinct values. The team insists that all preprocessing logic be declared inside the model definition so it is version-controlled together with the model and reruns automatically every time the model is retrained. Which approach best meets these requirements while minimizing operational overhead?

  • Create the model with a TRANSFORM clause that applies ML.IMPUTER, ML.ROBUST_SCALER, and ML.ONE_HOT_ENCODER to the raw columns inside the CREATE MODEL … OPTIONS(model_type = 'logistic_reg') statement.

  • Run a Cloud Dataflow pipeline to write a cleaned staging table, then point a CREATE MODEL statement (without TRANSFORM) at the staged data.

  • Issue CREATE MODEL without any preprocessing and rely on BigQuery ML's automatic handling of missing values and scaling.

  • Build a materialized view that encodes and scales the features, refresh it automatically, and train the logistic-regression model against the view.

GCP Professional Data Engineer
Preparing and using data for analysis
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot