Your team is building a predictive-maintenance model for turbine engines. The continuous feature fuel_flow_rate (kg/s) is strongly right-skewed, and about 3 % of the observations are exactly 0 kg/s when the engine is idle. To stabilize variance and approximate normality you decide to call scipy.stats.boxcox so the optimal Box-Cox power λ can be estimated. Which preparatory step is required before invoking the function so that the transformation and its likelihood calculation are well-defined for every observation?
Standardize the feature to zero mean and unit variance; this alone makes Box-Cox valid.
Add the same small positive constant to every value so the entire feature becomes strictly greater than 0, then run Box-Cox.
Mark the idle (0 kg/s) rows as missing or drop them, and apply Box-Cox to the remaining records unchanged.
Subtract the sample mean to center the data around zero before applying Box-Cox.
Box-Cox is mathematically defined only for strictly positive inputs. Both the power term y^λ (when λ ≠ 0) and the special logarithmic case (λ = 0) break down at y ≤ 0. Therefore every observation must be shifted above zero by adding the same small positive constant (for example, 1 × 10⁻⁶ or 1 on the native scale). Centering, standardizing, or dropping the zero rows does not guarantee strictly positive inputs, and scipy.stats.boxcox will raise an error if any value is non-positive. Shifting preserves the relative ordering of observations and allows λ to be estimated on valid data, after which the inverse shift can be applied to return predictions to the original scale.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is the Box-Cox transformation only defined for strictly positive values?
Open an interactive chat with Bash
How do you choose an appropriate small constant to make data strictly positive for Box-Cox?
Open an interactive chat with Bash
What happens if you ignore this preparatory step and apply Box-Cox directly to non-positive values?