A data scientist is evaluating a new fraud detection model. The model is tested on a large dataset where 1% of transactions are known to be fraudulent. The model correctly identifies 95% of all fraudulent transactions (True Positive Rate). However, it also incorrectly flags 2% of legitimate transactions as fraudulent (False Positive Rate). Given that the model has flagged a transaction as fraudulent, what is the probability that the transaction is actually fraudulent?
The correct answer is determined by applying Bayes' theorem, which is used to find the probability of a hypothesis given new evidence. In this scenario, we want to find the posterior probability of a transaction being fraudulent given that the model flagged it as positive, or P(Fraud | Positive).
B = The model flags the transaction as positive (Positive).
P(A) = P(Fraud) = 0.01 (The prior probability of a transaction being fraudulent).
P(B|A) = P(Positive | Fraud) = 0.95 (The probability of a positive flag given a fraudulent transaction; the True Positive Rate).
P(Positive | Not Fraud) = 0.02 (The probability of a positive flag given a legitimate transaction; the False Positive Rate).
P(Not Fraud) = 1 - P(Fraud) = 1 - 0.01 = 0.99.
First, we must calculate the total probability of the model flagging a transaction as fraudulent, P(B) or P(Positive), using the law of total probability: P(Positive) = P(Positive | Fraud) * P(Fraud) + P(Positive | Not Fraud) * P(Not Fraud) P(Positive) = (0.95 * 0.01) + (0.02 * 0.99) P(Positive) = 0.0095 + 0.0198 P(Positive) = 0.0293
Now we can substitute these values into Bayes' theorem: P(Fraud | Positive) = [P(Positive | Fraud) * P(Fraud)] / P(Positive) P(Fraud | Positive) = (0.95 * 0.01) / 0.0293 P(Fraud | Positive) = 0.0095 / 0.0293 P(Fraud | Positive) ≈ 0.3242 or 32.4%
The other options are incorrect for the following reasons:
95.0% is the True Positive Rate, P(Positive | Fraud). Confusing this with the posterior probability P(Fraud | Positive) is a common error known as the prosecutor's fallacy.
67.6% is the probability of the transaction not being fraudulent given a positive flag, or P(Not Fraud | Positive), which is 1 - 0.324.
2.9% is the marginal probability of any transaction receiving a positive flag, P(Positive), which is the denominator in the Bayes' calculation.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Bayes' theorem, and why is it important for evaluating probabilities in models like fraud detection?
Open an interactive chat with Bash
What is the difference between True Positive Rate and False Positive Rate in a model?
Open an interactive chat with Bash
Why is the prior probability of fraud (P(Fraud)) important in calculating the posterior probability?