A data scientist at an e-commerce company is analyzing the complete customer journey for a recent marketing campaign. The goal is to understand the flow of users and the magnitude of drop-offs at each stage. The dataset tracks user progression from initial ad impression, to website visit, to adding an item to the cart, and finally to purchase. It also captures how users from different initial sources (e.g., social media, search engine, direct link) navigate through these stages. Which type of visualization is most effective for representing the proportional flow and attrition of users through this multi-stage process?
A Sankey diagram is the correct choice because it is specifically designed to visualize flows and their quantities across multiple stages. The width of the connections (links) between stages (nodes) is proportional to the flow quantity, making it ideal for showing how a total volume is distributed and where losses occur in a process like a customer journey or marketing funnel.
A scatter plot matrix is used to visualize pairwise relationships between multiple numerical variables to identify correlations, not to show flow through sequential stages.
A box and whisker plot is used for univariate analysis to show the distribution of a single numerical variable, including its quartiles, median, and outliers. It cannot represent the flow of quantities between different categorical stages.
A heat map visualizes data in a matrix format, using color to represent the magnitude of values. While useful for showing correlations or the intensity of a phenomenon across two discrete dimensions, it does not effectively illustrate the progression and proportional flow of quantities through a sequence of steps.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is a Sankey diagram ideal for visualizing a customer journey?
Open an interactive chat with Bash
How is a Sankey diagram different from a heat map in analyzing data flows?
Open an interactive chat with Bash
When would you use a scatter plot matrix instead of a Sankey diagram?