mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-11-26 18:13:59 -05:00
119 lines
No EOL
3.2 KiB
Markdown
119 lines
No EOL
3.2 KiB
Markdown
# Cluster Analysis Spring 2018
|
|
|
|
### Distance, Dimensionality Reduction, and Tendency
|
|
|
|
- Distance
|
|
- Euclidean Distance
|
|
- Squared Euclidean Distance
|
|
- Manhattan Distance
|
|
- Maximum Distance
|
|
- Mahalanobis Distance
|
|
- Which distance function should you use?
|
|
- PCA
|
|
- Cluster Tendency
|
|
- Hopkins Statistic
|
|
- Scaling Data
|
|
|
|
### Validating Clustering Models
|
|
|
|
- Clustering Validation
|
|
- Cross Validation
|
|
|
|
### Connectivity Models
|
|
|
|
- Agglomerative Clustering
|
|
- Single Linkage Clustering
|
|
- Complete Linkage Clustering
|
|
- Unweighted Pair Group Method with Arithmetic Mean (If time permits)
|
|
- Dendrograms
|
|
- Divisive Clustering
|
|
- CURE (Clustering using REpresentatives) algorithm (If time permits)
|
|
|
|
### Cluster Evaluation
|
|
|
|
- Internal Evaluation
|
|
- Dunn Index
|
|
- Silhouette Coefficient
|
|
- Davies-Bouldin Index (If time permits)
|
|
- External Evaluation
|
|
- Rand Measure
|
|
- Jaccard Index
|
|
- Dice Index
|
|
- Confusion Matrix
|
|
- F Measure (If time permits)
|
|
- Fowlkes-Mallows Index (If time permits)
|
|
|
|
### Centroid Models
|
|
|
|
- Jenks Natural Breaks Optimization
|
|
- Voronoi Diagram
|
|
- K means clustering
|
|
- K medoids clustering
|
|
- K Medians/Modes clustering
|
|
- When to use K means as opposed to K medoids or K Medians?
|
|
- How many clusters should you use?
|
|
- Lloyd's Algorithm for Approximating K-means (If time permits)
|
|
|
|
### Density Models
|
|
|
|
- DBSCAN Density Based Clustering Algorithm
|
|
- OPTICS Ordering Points To Identify the Clustering Structure
|
|
- DeLi-Clu Density Link Clustering (If time permits)
|
|
- What should be your density threshold?
|
|
|
|
### Analysis of Model Appropriateness
|
|
|
|
- When do we use each of the models above?
|
|
|
|
### Distribution Models (If time permits)
|
|
|
|
- Fuzzy Clusters
|
|
- EM (Expectation Maximization) Clustering
|
|
- Maximum Likelihood Gaussian
|
|
- Probabilistic Hierarchal Clustering
|
|
|
|
|
|
|
|
|
|
## Textbooks
|
|
|
|
Cluster Analysis 5th Edition
|
|
|
|
- Authors: Brian S. Everitt, Sabine Landau, Morven Leese, Daniel Stahl
|
|
- ISBN-13: 978-0470749913
|
|
- Cost: Free on UMW Library Site
|
|
- Amazon Link: https://www.amazon.com/Cluster-Analysis-Brian-S-Everitt/dp/0470749911/ref=sr_1_1?ie=UTF8&qid=1509135983&sr=8-1
|
|
- Table of Contents: http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002266.html
|
|
|
|
Cluster Analysis: 2014 Edition (Statistical Associates Blue Book Series 24)
|
|
|
|
- Author: David Garson
|
|
- ISBN: 978-1-62638-030-1
|
|
- Cost: Free with Site Registration
|
|
- Website: http://www.statisticalassociates.com/clusteranalysis.htm
|
|
|
|
|
|
|
|
## Schedule
|
|
|
|
In an ideal world, the topics below I estimated being a certain time period for learning them. Of course you have more experience when it comes to how long it actually takes to learn these topics, so I'll leave this mostly to your discretion.
|
|
|
|
**Distance, Dimensionality Reduction, and Tendency** -- 3 Weeks
|
|
|
|
**Validating Cluster Models** -- 1 Week
|
|
|
|
**Connectivity Models** -- 2 Weeks
|
|
|
|
**Cluster Evaluation** -- 1 Week
|
|
|
|
**Centroid Models** -- 3 Weeks
|
|
|
|
**Density Models** -- 3 Weeks
|
|
|
|
**Analysis of Model Appropriateness** -- 1 Week
|
|
|
|
The schedule above accounts for 14 weeks, so there is a week that is free as a buffer.
|
|
|
|
## Conclusion
|
|
|
|
Creating this document got me really excited for this independent study. Feel free to give me feedback :) |