You are preparing the feature request_latency_ms for a regression model. Exploratory analysis shows that only the upper tail is problematic: about 2 % of records exceed 3 000 ms, while the lower tail appears clean. You must preserve all rows but limit the influence of those extreme high values using SciPy's winsorize so that only the top 2 % of observations are capped and the lower tail is left completely untouched.
import numpy as np
from scipy.stats.mstats import winsorize
latency = np.load('latency.npy')
latency_clean = winsorize(latency, limits=______) # fill in
Which tuple correctly replaces ______ to meet the requirement?
The limits argument in scipy.stats.mstats.winsorize expects a two-element tuple (lower, upper) containing the proportions of data to Winsorize on each side. A value of None designates an open interval-no Winsorization-on that side. Therefore, to leave the lower tail unchanged and Winsorize only the upper 2 %, you pass (None, 0.02).
(None, 0.02) sets no lower bound Winsorization and caps the highest 2 % of observations at the 98th percentile, preserving all rows.
(0.02, 0.02) Winsorizes both 2 % tails, altering the lower tail, which violates the scenario.
(0.02, None) Winsorizes only the lower 2 %, leaving the high outliers untouched.
(0.0, 0.02) specifies 0 % on the lower tail, which technically works but still computes an unnecessary quantile and does not satisfy the explicit instruction to leave the lower limit open.
Hence, (None, 0.02) is the only tuple that meets all stated requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Winsorization, and why is it used?
Open an interactive chat with Bash
How does the `limits` argument in `scipy.stats.mstats.winsorize` work?
Open an interactive chat with Bash
Why is `(None, 0.02)` better than `(0.0, 0.02)` in this case?