BisectingKMeansModel

class pyspark.mllib.clustering.BisectingKMeansModel(java_model)[source]

A clustering model derived from the bisecting k-means method.

>>> data = array([0.0,0.0, 1.0,1.0, 9.0,8.0, 8.0,9.0]).reshape(4, 2)
>>> bskm = BisectingKMeans()
>>> model = bskm.train(sc.parallelize(data, 2), k=4)
>>> p = array([0.0, 0.0])
>>> model.predict(p)
0
>>> model.k
4
>>> model.computeCost(p)
0.0

New in version 2.0.0.

Methods

Attributes

Methods Documentation

call(name, *a)

Call method of java_model

computeCost(x)[source]

Return the Bisecting K-means cost (sum of squared distances of points to their nearest center) for this model on the given data. If provided with an RDD of points returns the sum.

Parameters

point – A data point (or RDD of points) to compute the cost(s).

New in version 2.0.0.

predict(x)[source]

Find the cluster that each of the points belongs to in this model.

Parameters

x – A data point (or RDD of points) to determine cluster index.

Returns

Predicted cluster index or an RDD of predicted cluster indices if the input is an RDD.

New in version 2.0.0.

Attributes Documentation

clusterCenters

Get the cluster centers, represented as a list of NumPy arrays.

New in version 2.0.0.

k

Get the number of clusters

New in version 2.0.0.