LassoWithSGD

class pyspark.mllib.regression.LassoWithSGD[source]

New in version 0.9.0.

Note

Deprecated in 2.0.0. Use ml.regression.LinearRegression with elasticNetParam = 1.0. Note the default regParam is 0.01 for LassoWithSGD, but is 0.0 for LinearRegression.

Methods

Methods Documentation

classmethod train(data, iterations=100, step=1.0, regParam=0.01, miniBatchFraction=1.0, initialWeights=None, intercept=False, validateData=True, convergenceTol=0.001)[source]

Train a regression model with L1-regularization using Stochastic Gradient Descent. This solves the l1-regularized least squares regression formulation

f(weights) = 1/(2n) ||A weights - y||^2 + regParam ||weights||_1

Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right hand side label y. See also the documentation for the precise formulation.

Parameters
  • data – The training data, an RDD of LabeledPoint.

  • iterations – The number of iterations. (default: 100)

  • step – The step parameter used in SGD. (default: 1.0)

  • regParam – The regularizer parameter. (default: 0.01)

  • miniBatchFraction – Fraction of data to be used for each SGD iteration. (default: 1.0)

  • initialWeights – The initial weights. (default: None)

  • intercept – Boolean parameter which indicates the use or not of the augmented representation for training data (i.e. whether bias features are activated or not). (default: False)

  • validateData – Boolean parameter which indicates if the algorithm should validate data before training. (default: True)

  • convergenceTol – A condition which decides iteration termination. (default: 0.001)

New in version 0.9.0.