A data scientist receives a monthly sales time-series that shows a strong upward trend. After computing a first non-seasonal difference the series looks roughly stationary and its standard deviation falls from 48.2 to 33.9 units. Curious about further improvement, the scientist applies a second non-seasonal difference. The new series now has a standard deviation of 52.4 units and a lag-1 autocorrelation of −0.62. Which conclusion most accurately describes the data issue that has occurred and an appropriate next action?
The negative lag-1 ACF reveals hidden seasonality; keep the second difference and add an additional seasonal difference to remove the seasonal cycle.
The higher variance signals heteroskedasticity exposed by differencing; maintain the second difference but stabilize variance with a Box-Cox transformation.
The second difference has over-differenced the series, injecting noise and strong negative autocorrelation; revert to the first-differenced series (or the original levels) and add AR/MA terms if needed.
Variance inflation indicates that missing values were propagated; impute the gaps and repeat the second differencing step to restore stationarity.
The combination of a large negative lag-1 autocorrelation and an increase (rather than a decrease) in variance are classic warning signs of over-differencing. Differencing is intended to remove low-frequency structure and make the series stationary, but applying more differences than necessary injects noise, drives the first-lag ACF deep into the negative region (often past −0.5), and can inflate variance. The correct remedy is to roll back to the first (or zero) difference and, if needed, model any remaining structure with AR or MA terms rather than continue adding differences. The other options misdiagnose the symptom (they attribute it to heteroskedasticity, seasonality, or missing-value bias) and would encourage further transformations that are unlikely to solve the actual problem.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does over-differencing mean in time series analysis?
Open an interactive chat with Bash
What is lag-1 autocorrelation, and why is it significant?
Open an interactive chat with Bash
When should AR or MA terms be used instead of differencing?