While exploring a 2-dimensional dataset that contains two spatial clusters-one very dense and one much sparser-a data scientist tries to find a single (eps, minPts) setting in DBSCAN that will correctly identify both clusters. Every time she preserves the dense cluster, the sparse cluster is either merged into it or labeled as noise, and whenever she isolates the sparse cluster, the dense cluster fragments. Which underlying property of DBSCAN most directly causes this limitation?
DBSCAN assumes that all features are statistically independent and identically distributed, so clusters of varying density violate this assumption.
DBSCAN requires the user to specify the exact number of clusters beforehand; supplying the wrong number causes clusters to fragment or merge.
DBSCAN relies on a single global density threshold (eps) that applies to every point, so it cannot accommodate clusters with markedly different densities.
DBSCAN assigns points to clusters by minimizing within-cluster sum of squared errors (SSE), which biases it toward clusters of uniform density.
DBSCAN defines a cluster as a connected set of core points, where every core point has at least minPts neighbors inside a radius eps. Both eps and minPts are single, global hyper-parameters: the same density threshold is applied to every point in the dataset. If clusters differ greatly in density, no single (eps, minPts) pair can satisfy both-an eps small enough to keep the sparse cluster from merging will be too small for the dense cluster, causing it to split or be labeled as noise, and vice-versa. This is a well-known disadvantage of standard DBSCAN. The other statements are incorrect: DBSCAN does not require the number of clusters in advance, it does not minimize within-cluster SSE, and it makes no independence assumption about the features.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What do 'eps' and 'minPts' represent in DBSCAN?
Open an interactive chat with Bash
Why is having a single global density threshold a limitation in DBSCAN?
Open an interactive chat with Bash
Are there any modifications to DBSCAN that address varying densities in clusters?