Clustering
Models
HorseML.Clustering.Kmeans
— TypeKmeans(K; max=300, th=1e-4)
Kmeans method.
Parameters:
K
: number of classmax
: maximum number of repititionth
: converge threshold
Example
julia> model = Kmenas(3)
Kmeans{3}(Matrix{Float64}(undef, 0, 0), 100000000, 0.0001)
julia> using HorseML.Clustering: fit!
julia> fit!(model, x)
julia> model.labels |> size
(100, 3)
HorseML.Clustering.Xmeans
— TypeXmeans(; kinit=2, max=300, th=1e-4)
Xmeans method. This algorithm uses BIC to determine if a cluster should be split in two. In other words, you don't need to give the number of clusters, you can cluster only with the data.
Parameters:
kinit
: initial number of classmax
: maximum number of repititionth
: converge threshold
Example
julia> model = Xmenas()
Kmeans{3}(Matrix{Float64}(undef, 0, 0), 100000000, 0.0001)
julia> using HorseML.Clustering: fit!
julia> fit!(model, x)
julia> model.labels |> size
(100, 3)
HorseML.Clustering.GMM
— TypeGMM(K; max=1e+8, th=1e-4)
Gauss Mixture Model. This is useful when the data follow a Gaussian distribution.
With this method, the parameter initialization method is tentative and may not give good results.
Parameters:
K
: number of classmax
: maximum number of repititionth
: converge threshold
Example
julia> model = GMM(3)
GMM{3}(Float64[], Matrix{Float64}(undef, 0, 0), Array{Float64, 3}(undef, 0, 0, 0), 100000000, 0.0001)
julia> using HorseML.Clustering: fit!
julia> fit!(model, x)
julia> model.labels |> size
(100, 3)
HorseML.Clustering.HDBSCAN
— TypeHDBSCAN(ε, minpts)
Density-Based Clustering Based on Hierarchical Density Estimates. This algorithm performs clustering as follows.
- generate a minimum spanning tree
- build a HDBSCAN hierarchy
- extract the target cluster
- generate the list of cluster assignment for each point
The detail is so complex it is difficult to explain the detail in here. But, if you want to know more about this algorithm, you should read this docs.
Parameters:
k
: we will define "core distance of point A" as the distance between point A and thek
th neighbor point of point A.min_cluster_size
: minimum number of points in the cluster
Example
julia> model = HDBSCAN(3, 5)
HDBSCAN(3, 5, #undef)
julia> fit!(model, data); #What is returned at this time is the minimum spanning tree of the data generated during the clustering process.
julia> model.labels |> size
(100,)
HorseML.Clustering.DBSCAN
— TypeDBSCAN(ε, minpts)
Density-based spatial clustering of applications with noise. In a word, if number of neighbors of a data is more than minpts
, extend a cluster, if not, establish a cluster.
Parameters:
ε
: maximum distance of neighbors.minpts
: minimum number of neighbors of a data
Example
julia> model = DBSCAN(0.3, 5)
DBSCAN(0.09, 5, 0, Point[])
julia> model(x) |> size
(100, 3)
Other
HorseML.Clustering.fit!
— Functionfit!(model, x)
fit Kmeans or GMM model with data.
size of x
is (numer of data, number of features).