Clustering

Models

HorseML.Clustering.Kmeans — Type

Kmeans(K; max=300, th=1e-4)

Kmeans method.

Parameters:

K: number of class
max: maximum number of repitition
th: converge threshold

Example

julia> model = Kmenas(3)
Kmeans{3}(Matrix{Float64}(undef, 0, 0), 100000000, 0.0001)

julia> using HorseML.Clustering: fit!

julia> fit!(model, x)

julia> model.labels |> size
(100, 3)

source

HorseML.Clustering.Xmeans — Type

Xmeans(; kinit=2, max=300, th=1e-4)

Xmeans method. This algorithm uses BIC to determine if a cluster should be split in two. In other words, you don't need to give the number of clusters, you can cluster only with the data.

Parameters:

kinit: initial number of class
max: maximum number of repitition
th: converge threshold

Example

julia> model = Xmenas()
Kmeans{3}(Matrix{Float64}(undef, 0, 0), 100000000, 0.0001)

julia> using HorseML.Clustering: fit!

julia> fit!(model, x)

julia> model.labels |> size
(100, 3)

source

HorseML.Clustering.GMM — Type

GMM(K; max=1e+8, th=1e-4)

Gauss Mixture Model. This is useful when the data follow a Gaussian distribution.

Warning

With this method, the parameter initialization method is tentative and may not give good results.

Parameters:

K: number of class
max: maximum number of repitition
th: converge threshold

Example

julia> model = GMM(3)
GMM{3}(Float64[], Matrix{Float64}(undef, 0, 0), Array{Float64, 3}(undef, 0, 0, 0), 100000000, 0.0001)

julia> using HorseML.Clustering: fit!

julia> fit!(model, x)

julia> model.labels |> size
(100, 3)

source

HorseML.Clustering.HDBSCAN — Type

HDBSCAN(ε, minpts)

Density-Based Clustering Based on Hierarchical Density Estimates. This algorithm performs clustering as follows.

generate a minimum spanning tree
build a HDBSCAN hierarchy
extract the target cluster
generate the list of cluster assignment for each point

The detail is so complex it is difficult to explain the detail in here. But, if you want to know more about this algorithm, you should read this docs.

Parameters:

k: we will define "core distance of point A" as the distance between point A and the k th neighbor point of point A.
min_cluster_size: minimum number of points in the cluster

Example

julia> model = HDBSCAN(3, 5)
HDBSCAN(3, 5, #undef)

julia> fit!(model, data); #What is returned at this time is the minimum spanning tree of the data generated during the clustering process.

julia> model.labels |> size
(100,)

source

HorseML.Clustering.DBSCAN — Type

DBSCAN(ε, minpts)

Density-based spatial clustering of applications with noise. In a word, if number of neighbors of a data is more than minpts, extend a cluster, if not, establish a cluster.

Parameters:

ε: maximum distance of neighbors.
minpts: minimum number of neighbors of a data

Example

julia> model = DBSCAN(0.3, 5)
DBSCAN(0.09, 5, 0, Point[])

julia> model(x) |> size
(100, 3)

source

Other

HorseML.Clustering.fit! — Function

fit!(model, x)

fit Kmeans or GMM model with data.

size of x is (numer of data, number of features).

source