<p>Clustering using Representatives is a Hierarchical clustering technique in which you can represent a cluster using a <strong>set</strong> of well-scattered representative points.</p>
<p>This algorithm has a parameter $\alpha$ which defines the factor of the points in which to shrink towards the centroid.</p>
<p>CURE is known to be robust to outliers and able to identify clusters that have a <strong>non-spherical</strong> shape and size variance.</p>
<p>The clusters with the closest pair of representatives are the clusters that are merged at each step of CURE's algorithm.</p>
<p>This algorithm cannot be directly applied to large datasets due to high runtime complexity. Several enhancements were added to address this requirement</p>
<ul>
<li>Random sampling: This involves a trade off between accuracy and efficiency. One would hope that the random sample they obtain is representative of the population</li>
<li>Partitioning: The idea is to partition the sample space into $p$ partitions</li>
<li>Pick a random sample of points that fit in main memory</li>
<li>Cluster sample points hierarchically to create the initial clusters</li>
<li>Pick representative point<strong>s</strong>
<ol>
<li>For each cluster, pick $k$ representative points, as dispersed as possible</li>
<li>Move each representative points to a fixed fraction $\alpha$ toward the centroid of the cluster</li>
</ol></li>
<li>Rescan the whole dataset and visit each point $p$ in the data set</li>
<li>Place it in the "closest cluster"
<ol>
<li>Closest as in shortest distance among all the representative points.</li>
</ol></li>
</ol>
<h2>TSNE</h2>
<p>TSNE allows us to reduce the dimensionality of a dataset to two which allows us to visualize the data.</p>
<p>It is able to do this since many real-world datasets have a low intrinsic dimensionality embedded within the high-dimensional space. </p>
<p>Since the technique needs to conserve the structure of the data, two corresponding mapped points must be close to each other distance wise as well. Let $|x_i - x_j|$ be the Euclidean distance between two data points, and $|y_i - y<em>j|$ he distance between the map points. This conditional similarity between two data points is:
Where we are considering the <strong>Gaussian distribution</strong> surrounding the distance between $x_j$ from $x_i$ with a given variance $\sigma_i^2$. The variance is different for every point; it is chosen such that points in dense areas are given a smaller variance than points in sparse areas.</p>
<p>Now the similarity matrix for mapped points are