A data scientist is analyzing a univariate time-series that contains 120 monthly observations of an online retailer's gross revenue (January 2015 - December 2024). A visualization indicates that both the mean and the variance grow over time. To verify this, the analyst runs two unit-root tests:
Augmented Dickey-Fuller (ADF): test statistic = -2.10, p-value = 0.23
KPSS (trend): test statistic = 0.95, p-value = 0.01 (critical value = 0.463)
The analyst plans to fit an ordinary least-squares (OLS) regression to the level data for forecasting.
Which statement identifies the most critical data issue and a suitable first step to address it before modeling?
Sparse data leading to overfitting; aggregate the monthly data to quarterly frequency and impute missing observations.
Multicollinearity among explanatory variables; compute variance inflation factors and drop highly correlated predictors.
Non-linearity in the relationship between revenue and time; replace the linear regression with a higher-order polynomial without altering the data.
Non-stationarity caused by a stochastic trend and changing variance; apply a first-order differencing (and/or a variance-stabilizing transformation) to make the series stationary before modeling.
The ADF p-value above 0.05 means the null hypothesis that the series has a unit root cannot be rejected, while the KPSS p-value below 0.05 rejects the null that the series is trend-stationary. Together these results confirm non-stationarity driven by a stochastic trend (and the rising variance seen in the plot). Building an OLS model on a non-stationary series risks spurious regression and unreliable inference. The standard initial remedy is to difference (and, if necessary, apply a variance-stabilizing transformation such as a log or Box-Cox) so that the transformed series exhibits a constant mean and variance. The other options describe issues-multicollinearity, sparsity, or simple non-linearity-that are not evidenced by the tests or by the behavior of the single time-series.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does it mean for a time series to be non-stationary?
Open an interactive chat with Bash
What is the Augmented Dickey-Fuller (ADF) test?
Open an interactive chat with Bash
What is the KPSS test, and how does it differ from the ADF test?