Clustering

Models

HorseML.Clustering.KmeansType
Kmeans(K; max=300, th=1e-4)

Kmeans method.

Parameters:

  • K: number of class
  • max: maximum number of repitition
  • th: converge threshold

Example

julia> model = Kmenas(3)
Kmeans{3}(Matrix{Float64}(undef, 0, 0), 100000000, 0.0001)

julia> using HorseML.Clustering: fit!

julia> fit!(model, x)

julia> model.labels |> size
(100, 3)
source
HorseML.Clustering.XmeansType
Xmeans(; kinit=2, max=300, th=1e-4)

Xmeans method. This algorithm uses BIC to determine if a cluster should be split in two. In other words, you don't need to give the number of clusters, you can cluster only with the data.

Parameters:

  • kinit: initial number of class
  • max: maximum number of repitition
  • th: converge threshold

Example

julia> model = Xmenas()
Kmeans{3}(Matrix{Float64}(undef, 0, 0), 100000000, 0.0001)

julia> using HorseML.Clustering: fit!

julia> fit!(model, x)

julia> model.labels |> size
(100, 3)
source
HorseML.Clustering.GMMType
GMM(K; max=1e+8, th=1e-4)

Gauss Mixture Model. This is useful when the data follow a Gaussian distribution.

Warning

With this method, the parameter initialization method is tentative and may not give good results.

Parameters:

  • K: number of class
  • max: maximum number of repitition
  • th: converge threshold

Example

julia> model = GMM(3)
GMM{3}(Float64[], Matrix{Float64}(undef, 0, 0), Array{Float64, 3}(undef, 0, 0, 0), 100000000, 0.0001)

julia> using HorseML.Clustering: fit!

julia> fit!(model, x)

julia> model.labels |> size
(100, 3)
source
HorseML.Clustering.HDBSCANType
HDBSCAN(ε, minpts)

Density-Based Clustering Based on Hierarchical Density Estimates. This algorithm performs clustering as follows.

  1. generate a minimum spanning tree
  2. build a HDBSCAN hierarchy
  3. extract the target cluster
  4. generate the list of cluster assignment for each point

The detail is so complex it is difficult to explain the detail in here. But, if you want to know more about this algorithm, you should read this docs.

Parameters:

  • k: we will define "core distance of point A" as the distance between point A and the k th neighbor point of point A.
  • min_cluster_size: minimum number of points in the cluster

Example

julia> model = HDBSCAN(3, 5)
HDBSCAN(3, 5, #undef)

julia> fit!(model, data); #What is returned at this time is the minimum spanning tree of the data generated during the clustering process.

julia> model.labels |> size
(100,)
source
HorseML.Clustering.DBSCANType
DBSCAN(ε, minpts)

Density-based spatial clustering of applications with noise. In a word, if number of neighbors of a data is more than minpts, extend a cluster, if not, establish a cluster.

Parameters:

  • ε: maximum distance of neighbors.
  • minpts: minimum number of neighbors of a data

Example

julia> model = DBSCAN(0.3, 5)
DBSCAN(0.09, 5, 0, Point[])

julia> model(x) |> size
(100, 3)
source

Other