A data science team at a national retail chain is developing a model to predict daily foot traffic for its physical stores. The current dataset includes historical sales figures, store locations (latitude and longitude), and records of local marketing campaigns. Initial models show low predictive accuracy, and the team concludes that the available features are insufficient to capture the primary drivers of customer visits. To address this, the lead data scientist decides to enrich the dataset with an external data source.
Which of the following external datasets would most directly and significantly improve the model's ability to predict daily foot traffic by addressing the likely feature insufficiency?
A geocoded dataset of public social media posts, containing timestamps and user-generated text from the vicinity of the stores.
National and regional economic indicators, such as consumer price index (CPI) and unemployment rates.
Aggregated human mobility data from location-based services, providing anonymized foot traffic counts and dwell times for specific geographic grid cells.
Census-level demographic data, including population density, income levels, and age distribution for the ZIP codes where the stores are located.
The correct answer is aggregated human mobility data. This type of dataset, often sourced from anonymized location-based services, provides a direct proxy for the target variable (foot traffic) at a high spatial and temporal resolution. It captures patterns of population movement, dwell times, and area density, which are powerful predictors for daily activity around a retail location.
Census-level demographic data is too static, typically updated annually or decadally, making it unsuitable for predicting daily fluctuations.
Geocoded social media data is often noisy, sparse, and represents a biased sample of the population, making it difficult to extract a reliable signal for general foot traffic.
National and regional economic indicators lack the geographic and temporal granularity required to predict daily foot traffic for a specific store.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is aggregated human mobility data the best external dataset for this model?
Open an interactive chat with Bash
What is the limitation of using census-level demographic data to predict daily foot traffic?
Open an interactive chat with Bash
What challenges make geocoded social media data less reliable for predicting foot traffic?