An e-commerce company plans to run online validation of a new ranking model. The current production model (champion) will continue to serve users, while 50 % of requests are randomly routed to a challenger. Business stakeholders want to replace the champion only if the challenger shows at least a 2 % lift in click-through rate (CTR). Which step is most critical before traffic is split to ensure the experiment yields statistically valid evidence?
Enable detailed feature logging for the challenger so offline explainability tools can be applied after the test.
Increase the inference endpoint's autoscaling threshold so both variants can absorb peak traffic without throttling.
Deploy the challenger only in shadow mode, receiving mirrored traffic that does not affect live users.
Compute the minimum number of user impressions needed to detect a 2 % absolute lift in CTR at the chosen significance and power levels.
Determining the minimum sample size with a power analysis, which involves setting a significance level (α) and statistical power (1-β), is crucial. This step ensures the A/B test will collect enough impressions to reliably detect a 2% lift while controlling type I and type II error rates. If the sample size is too small, the test could end prematurely, leading to false conclusions. The other actions improve observability (feature logging), safety (shadow mode), or reliability (autoscaling), but they do not guarantee that an observed CTR difference will be statistically meaningful.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is statistical power in the context of A/B testing?
Open an interactive chat with Bash
Why is setting a significance level important in an experiment?
Open an interactive chat with Bash
How do you calculate the minimum sample size for detecting a 2% lift in CTR?