LinearDataGenerator

class pyspark.mllib.util.LinearDataGenerator[source]

Utils for generating linear data.

New in version 1.5.0.

Methods

Methods Documentation

static generateLinearInput(intercept, weights, xMean, xVariance, nPoints, seed, eps)[source]
Param

intercept bias factor, the term c in X’w + c

Param

weights feature vector, the term w in X’w + c

Param

xMean Point around which the data X is centered.

Param

xVariance Variance of the given data

Param

nPoints Number of points to be generated

Param

seed Random Seed

Param

eps Used to scale the noise. If eps is set high, the amount of gaussian noise added is more.

Returns a list of LabeledPoints of length nPoints

New in version 1.5.0.

static generateLinearRDD(sc, nexamples, nfeatures, eps, nParts=2, intercept=0.0)[source]

Generate an RDD of LabeledPoints.

New in version 1.5.0.