While exploring a high-dimensional customer-behavior dataset, an engineer applies t-distributed stochastic neighbor embedding (t-SNE) with perplexity = 5 and obtains a 2-D map in which a single known customer segment is broken into several tiny islands that are unrealistically far apart. Without drastically increasing run time or changing the distance metric, which hyper-parameter change is MOST likely to merge those islands into one contiguous cluster?
Increase the perplexity value toward the 30-50 range so that each point considers more nearest neighbors.
Cut the maximum number of optimization iterations in half to reduce the risk of overfitting the embedding.
Reduce the early-exaggeration factor from 12 to 2 so clusters start closer together.
Lower the learning rate from 200 to around 10 to slow the gradient-descent updates.
Perplexity controls the effective number of nearest neighbors that each point considers when t-SNE builds its probability distributions. A very low value (such as 5) makes the algorithm focus on extremely local structure, which can fragment a continuous group into many small islands. Increasing perplexity toward the typical range of 30-50 expands the neighborhood size, encouraging related points to stay together and restoring the expected single cluster. Learning rate and iteration count mainly affect optimization dynamics, while early exaggeration adjusts the spacing between already formed clusters; neither resolves the over-fragmentation that results from too small a perplexity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is t-SNE and how does it work?
Open an interactive chat with Bash
What is perplexity in t-SNE, and why is it important?
Open an interactive chat with Bash
Why wouldn't lowering the learning rate or reducing the early exaggeration fix the fragmented clusters?