A data science team is preparing a quarterly business review for executive stakeholders and product managers. The team has analyzed a large dataset of user interactions on their e-commerce platform over the past 12 months. The dataset contains the following key fields: UserID, TransactionTimestamp, ProductCategory, and TransactionAmount.
The primary goal of the presentation is to communicate two key insights in a single, cohesive visualization:
The flow of revenue between different product categories (e.g., how revenue from customers who first bought 'Electronics' later flowed to 'Home Goods').
The proportional contribution of each product category to the total revenue, including how that revenue is distributed in subsequent purchases.
Which visualization is the MOST appropriate for effectively representing these specific, multi-dimensional insights to a mixed audience?
A scatter plot matrix showing pairwise relationships between the revenues of all product categories.
A heatmap of co-purchase revenue, where rows and columns are product categories and cell color indicates the total revenue for customers who bought from both.
A Sankey diagram.
A stacked bar chart, with product categories on the x-axis and revenue on the y-axis, segmented by the subsequent category purchased.
The correct answer is a Sankey diagram. Sankey diagrams are specifically designed to visualize flow and magnitude between different states or categories. In this scenario, the product categories serve as the nodes, and the links between them represent the flow of revenue from one category to another. The width of these links is proportional to the amount of revenue, effectively showing both the path of customer spending and the proportional contribution of each category and flow to the total.
A stacked bar chart is incorrect because while it can show part-to-whole compositions, it is not effective at illustrating the complex, multi-directional flow of revenue between multiple categories simultaneously. A heatmap is also incorrect; it could show the strength of co-purchase relationships between category pairs but would not illustrate the directional flow or the overall distribution path of the revenue. A scatter plot matrix is unsuitable as it is designed to explore pairwise correlations between multiple continuous variables, not to visualize the flow of a value through a system of categorical nodes.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Sankey diagram, and why is it suitable for visualizing flows?
Open an interactive chat with Bash
What are the limitations of using stacked bar charts for this analysis?
Open an interactive chat with Bash
How does a heatmap differ from a Sankey diagram in representing data like this?