A data scientist is investigating potential replay attacks across two web servers. She has loaded the 5 GB log exports into Pandas DataFrames df1 and df2, each with columns session_id, user_id, timestamp, and request_size. Because the security team only wants sessions that occur on both servers, she needs to create a DataFrame that contains exactly those sessions and the shared columns. Which single Pandas command satisfies this requirement with minimal extra rows or columns?
Using merge() with how="inner" performs an inner join on the specified key. An inner join returns only rows whose key values exist in both DataFrames-an exact intersection of the two datasets. how="outer" would produce the union of both logs, including sessions unique to either server. Concatenating and deduplicating would keep the union as well, not the intersection, and could drop important columns that differ. A left join with an indicator flag still retains rows unique to df1 until filtered separately, adding unnecessary steps. Therefore, the inner merge is the only option that directly yields the desired intersection.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of `how="inner"` in the Pandas `merge()` function?
Open an interactive chat with Bash
What is the difference between an inner join and an outer join in Pandas?
Open an interactive chat with Bash
Why is `pd.concat()` not the right choice in this scenario?