K-Means(K-均值) | number of clusters(聚类形成的簇的个数) | 非常大的 n_samples , 中等的 n_clusters 使用 MiniBatch 代码) | 通用, 均匀的 cluster size(簇大小), flat geometry(平面几何), 不是太多的 clusters(簇) | Distances between points(点之间的距离) |
Affinity propagation | damping(阻尼), sample preference(样本偏好) | Not scalable with n_samples(n_samples 不可扩展) | Many clusters, uneven cluster size, non-flat geometry(许多簇,不均匀的簇大小,非平面几何) | Graph distance (e.g. nearest-neighbor graph)(图距离(例如,最近邻图)) |
Mean-shift | bandwidth(带宽) | Not scalable with n_samples (n_samples 不可扩展) | Many clusters, uneven cluster size, non-flat geometry(许多簇,不均匀的簇大小,非平面几何) | Distances between points(点之间的距离) |
Spectral clustering | number of clusters(簇的个数) | 中等的 n_samples , 小的 n_clusters | Few clusters, even cluster size, non-flat geometry(几个簇,均匀的簇大小,非平面几何) | Graph distance (e.g. nearest-neighbor graph)(图距离(例如最近邻图)) |
Ward hierarchical clustering | number of clusters(簇的个数) | 大的 n_samples 和 n_clusters | Many clusters, possibly connectivity constraints(很多的簇,可能连接限制) | Distances between points(点之间的距离) |
Agglomerative clustering | number of clusters(簇的个数), linkage type(链接类型), distance(距离) | 大的 n_samples 和 n_clusters | Many clusters, possibly connectivity constraints, non Euclidean distances(很多簇,可能连接限制,非欧氏距离) | Any pairwise distance(任意成对距离) |
DBSCAN | neighborhood size(neighborhood 的大小) | 非常大的 n_samples , 中等的 n_clusters | Non-flat geometry, uneven cluster sizes(非平面几何,不均匀的簇大小) | Distances between nearest points(最近点之间的距离) |
Gaussian mixtures(高斯混合) | many(很多) | Not scalable(不可扩展) | Flat geometry, good for density estimation(平面几何,适用于密度估计) | Mahalanobis distances to centers( 与中心的马氏距离) |
Birch | branching factor(分支因子), threshold(阈值), optional global clusterer(可选全局簇). | 大的 n_clusters 和 n_samples | Large dataset, outlier removal, data reduction.(大型数据集,异常值去除,数据简化) | Euclidean distance between points(点之间的欧氏距离) |