StreamingLinearRegressionWithSGD¶
-
class
pyspark.mllib.regression.
StreamingLinearRegressionWithSGD
(stepSize=0.1, numIterations=50, miniBatchFraction=1.0, convergenceTol=0.001)[source]¶ Train or predict a linear regression model on streaming data. Training uses Stochastic Gradient Descent to update the model based on each new batch of incoming data from a DStream (see LinearRegressionWithSGD for model equation).
Each batch of data is assumed to be an RDD of LabeledPoints. The number of data points per batch can vary, but the number of features must be constant. An initial weight vector must be provided.
- Parameters
stepSize – Step size for each iteration of gradient descent. (default: 0.1)
numIterations – Number of iterations run for each batch of data. (default: 50)
miniBatchFraction – Fraction of each batch of data to use for updates. (default: 1.0)
convergenceTol – Value used to determine when to terminate iterations. (default: 0.001)
New in version 1.5.0.
Methods
Methods Documentation
-
latestModel
()¶ Returns the latest model.
New in version 1.5.0.
-
predictOn
(dstream)¶ Use the model to make predictions on batches of data from a DStream.
- Returns
DStream containing predictions.
New in version 1.5.0.
-
predictOnValues
(dstream)¶ Use the model to make predictions on the values of a DStream and carry over its keys.
- Returns
DStream containing the input keys and the predictions as values.
New in version 1.5.0.