A data analyst at an online retail company is examining a dataset of customer transactions from the past quarter. The initial calculation of the average transaction value is significantly higher than historical averages. The analyst suspects that one or more unusually large transactions might be skewing this metric. To investigate this, the analyst needs to formally identify these potential outliers. Which of the following is the most effective method for this purpose?
Calculate the mean and standard deviation of all transaction values to establish a baseline.
Calculate the interquartile range (IQR) and identify data points that fall above the upper fence (Q3 + 1.5 * IQR).
Check for duplicate transaction IDs to ensure each sale is recorded only once.
Group the data by customer ID and count the number of transactions for each.
The correct answer is to use the interquartile range (IQR) to identify data points outside the main distribution. The IQR method is a standard and robust technique for identifying outliers because it is less sensitive to the extreme values themselves, unlike methods that rely on the mean and standard deviation. This method defines outliers as any data point that falls below Q1 - 1.5IQR or above Q3 + 1.5IQR.
Calculating the mean and standard deviation is part of the Z-score method for finding outliers, but the mean itself is heavily influenced by the suspected outliers, making it a less robust starting point in this scenario. Grouping data by customer ID is a form of aggregation that would help identify customers with high transaction frequency, not transactions with unusually high values. Checking for duplicate transaction IDs is an important data cleansing step to ensure data integrity, but it addresses a different type of inconsistency and would not identify a legitimate but unusually large transaction.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the interquartile range (IQR)?
Open an interactive chat with Bash
Why is the IQR method better for identifying outliers than the mean and standard deviation?
Open an interactive chat with Bash
What is the calculation for identifying outliers using the IQR method?
Open an interactive chat with Bash
CompTIA Data+ DA0-002 (V2)
Data Acquisition and Preparation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .