BisectingKMeansModel¶
-
class
pyspark.mllib.clustering.
BisectingKMeansModel
(java_model)[source]¶ A clustering model derived from the bisecting k-means method.
>>> data = array([0.0,0.0, 1.0,1.0, 9.0,8.0, 8.0,9.0]).reshape(4, 2) >>> bskm = BisectingKMeans() >>> model = bskm.train(sc.parallelize(data, 2), k=4) >>> p = array([0.0, 0.0]) >>> model.predict(p) 0 >>> model.k 4 >>> model.computeCost(p) 0.0
New in version 2.0.0.
Methods
Attributes
Methods Documentation
-
call
(name, *a)¶ Call method of java_model
-
computeCost
(x)[source]¶ Return the Bisecting K-means cost (sum of squared distances of points to their nearest center) for this model on the given data. If provided with an RDD of points returns the sum.
- Parameters
point – A data point (or RDD of points) to compute the cost(s).
New in version 2.0.0.
-
predict
(x)[source]¶ Find the cluster that each of the points belongs to in this model.
- Parameters
x – A data point (or RDD of points) to determine cluster index.
- Returns
Predicted cluster index or an RDD of predicted cluster indices if the input is an RDD.
New in version 2.0.0.
Attributes Documentation
-
clusterCenters
¶ Get the cluster centers, represented as a list of NumPy arrays.
New in version 2.0.0.
-
k
¶ Get the number of clusters
New in version 2.0.0.
-