A data-science team is segmenting 10 000 customers represented by 85-dimensional feature vectors. They run k-means for k = 2 through 10 and obtain the metrics below (inertia is the within-cluster sum of squared errors):
k
2
3
4
5
6
7
8
9
10
Inertia (Ă10â”)
7.9
5.2
4.1
3.6
3.3
3.0
2.8
2.6
2.5
Avg. Silhouette
0.47
0.62
0.59
0.56
0.53
0.50
0.49
0.48
0.47
Inertia shows an elbow at k = 4, whereas the average silhouette width peaks at k = 3 and then declines. Management wants clusters that are internally coherent and well separated while avoiding unnecessary splits. Which value of k should be chosen based on these results and accepted best practices?
k = 2 - prefer fewer clusters to minimize the risk of over-fitting the data.
k = 4 - pick the elbow point where inertia first shows diminishing returns.
k = 10 - use the value that gives the smallest possible inertia.
k = 3 - select the cluster count that maximizes the average silhouette width.
The silhouette coefficient simultaneously measures cohesion (how close points are within a cluster) and separation (how far clusters are from one another); its maximum therefore signals the most distinct grouping. Inertia always decreases as k grows, so the elbow heuristic is only a rule of thumb and can be ambiguous. In the table the global maximum silhouette (0.62) occurs at k = 3; after that value the score falls, indicating that adding clusters starts to reduce separation more than it improves compactness. Selecting k = 3 best satisfies the stated goal. Choosing k = 4 follows the elbow but ignores the deteriorating silhouette; k = 2 produces lower cohesion/separation, and k = 10 severely over-partitions despite the lowest inertia.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does the silhouette coefficient measure?
Open an interactive chat with Bash
Why does inertia decrease as k increases in k-means clustering?