A streaming service data scientist is building a logistic regression model to predict whether a subscriber will churn in the next quarter. The raw dataset contains two continuous variables:
TotalSpent - total amount (USD) the customer paid in the last 12 months
MonthlyBudget - the customer's self-reported discretionary budget (USD)
MonthlyBudget ranges from about 50 USD to 10 000 USD, so absolute spending values vary by several orders of magnitude. Exploratory analysis shows that customers who spend at least 60 % of their budget rarely churn, regardless of how many dollars they actually spend.
Which transformation best encodes this insight and minimizes the influence of raw scale differences when training the model?
Apply Z-score standardization separately to TotalSpent and MonthlyBudget so that each has mean 0 and variance 1.
Create a new feature spend_to_budget_ratio = TotalSpent / (12 * MonthlyBudget) and use this ratio in place of the two original variables.
Introduce an interaction term by multiplying the two variables (TotalSpent * MonthlyBudget) and keep both originals.
Replace TotalSpent with log10(TotalSpent) to reduce skewness while leaving MonthlyBudget unchanged.
Dividing TotalSpent by the customer's available budget produces a unit-less ratio that expresses spending intensity directly (e.g., 0.45 or 0.78). Because both inputs have the same unit (dollars), the quotient removes absolute scale, making the feature comparable across customers of very different means. Standardizing or logging the original variables reduces skew or scaling issues individually, but the model would still need to learn their relationship from scratch. Multiplying the variables does the opposite of what is needed: it magnifies scale differences and obscures the relative pattern that drives churn.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is dividing `TotalSpent` by `(12 * MonthlyBudget)` an effective transformation?
Open an interactive chat with Bash
What is the disadvantage of using Z-score standardization in this case?
Open an interactive chat with Bash
Why doesn't multiplying `TotalSpent` and `MonthlyBudget` solve the scale difference issue?