A data science team has developed a gradient boosting model to predict customer lifetime value (CLV) for a large e-commerce client. The primary business need is to identify the top 20% of customers by predicted value to target them for a new loyalty program. During development, a key marketing stakeholder also expressed a want for a model that predicts the exact future CLV in dollars with less than a 5% margin of error, to be used for precise long-term revenue forecasting.
Final testing of the model yields the following results:
The model demonstrates excellent ranking performance, correctly classifying over 90% of future high-value customers within its top 20% predicted segment.
The point prediction of the exact CLV amount has a Mean Absolute Percentage Error (MAPE) of 25%.
Further hyperparameter tuning and alternative model experiments show that significantly reducing the MAPE below 20% is not feasible with the available data.
Given this scenario, which recommendation best demonstrates the data scientist's ability to differentiate between business needs, wants, and reality?
Recommend transforming the model's precise CLV output into broad categories (e.g., "Low", "Medium", "High") and using these for loyalty program targeting to avoid using the inaccurate point predictions.
Advise against using the model in any capacity because the 25% MAPE indicates it is too inaccurate and unreliable for any business decision-making.
Recommend deploying the current model to identify the top 20% of customers, as it successfully meets the core business need. Communicate to stakeholders that a 25% MAPE is a realistic performance benchmark and that the model's strength is in ranking, not precise value prediction.
Postpone the loyalty program launch and reallocate the team's resources to extensive feature engineering and testing complex deep learning architectures to meet the stakeholder's 5% MAPE target.
The correct answer is to recommend deploying the model for its primary purpose of ranking customers while clearly communicating the limitations regarding its precision for forecasting. This approach directly satisfies the core business need, which is to identify high-value customers for the loyalty program, a task the model performs exceptionally well. It also addresses the stakeholder's 'want' by transparently explaining the 'reality' of the model's performance-that a 25% MAPE is a realistic benchmark for this type of complex prediction and that the 5% target is unattainable. This balances delivering immediate value with managing stakeholder expectations effectively.
Postponing the project to chase an unrealistic metric ignores the immediate value the model can provide and misallocates resources, especially since performance has plateaued. Advising against using the model entirely due to the high MAPE would be a mistake, as it overlooks the model's primary strength and its ability to meet the critical business requirement. Transforming the output into broad categories is a plausible but less optimal step; the model's strong ranking ability is already sufficient for the task, and this approach fails to directly address the stakeholder communication challenge.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is MAPE, and why is it important in this context?
Open an interactive chat with Bash
What is the difference between ranking performance and point prediction in a model?
Open an interactive chat with Bash
Why is it unrealistic to pursue a 5% MAPE target for this model?