StreamingKMeans¶
-
class
pyspark.mllib.clustering.
StreamingKMeans
(k=2, decayFactor=1.0, timeUnit='batches')[source]¶ Provides methods to set k, decayFactor, timeUnit to configure the KMeans algorithm for fitting and predicting on incoming dstreams. More details on how the centroids are updated are provided under the docs of StreamingKMeansModel.
- Parameters
k – Number of clusters. (default: 2)
decayFactor – Forgetfulness of the previous centroids. (default: 1.0)
timeUnit – Can be “batches” or “points”. If points, then the decay factor is raised to the power of number of new points and if batches, then decay factor will be used as is. (default: “batches”)
New in version 1.5.0.
Methods
Methods Documentation
-
predictOn
(dstream)[source]¶ Make predictions on a dstream. Returns a transformed dstream object
New in version 1.5.0.
-
predictOnValues
(dstream)[source]¶ Make predictions on a keyed dstream. Returns a transformed dstream object.
New in version 1.5.0.
-
setHalfLife
(halfLife, timeUnit)[source]¶ Set number of batches after which the centroids of that particular batch has half the weightage.
New in version 1.5.0.
-
setInitialCenters
(centers, weights)[source]¶ Set initial centers. Should be set before calling trainOn.
New in version 1.5.0.