ALS

class pyspark.mllib.recommendation.ALS[source]

Alternating Least Squares matrix factorization

New in version 0.9.0.

Methods

Methods Documentation

classmethod train(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, nonnegative=False, seed=None)[source]

Train a matrix factorization model given an RDD of ratings by users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

Parameters
  • ratings – RDD of Rating or (userID, productID, rating) tuple.

  • rank – Number of features to use (also referred to as the number of latent factors).

  • iterations – Number of iterations of ALS. (default: 5)

  • lambda – Regularization parameter. (default: 0.01)

  • blocks – Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)

  • nonnegative – A value of True will solve least-squares with nonnegativity constraints. (default: False)

  • seed – Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)

New in version 0.9.0.

classmethod trainImplicit(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, alpha=0.01, nonnegative=False, seed=None)[source]

Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

Parameters
  • ratings – RDD of Rating or (userID, productID, rating) tuple.

  • rank – Number of features to use (also referred to as the number of latent factors).

  • iterations – Number of iterations of ALS. (default: 5)

  • lambda – Regularization parameter. (default: 0.01)

  • blocks – Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)

  • alpha – A constant used in computing confidence. (default: 0.01)

  • nonnegative – A value of True will solve least-squares with nonnegativity constraints. (default: False)

  • seed – Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)

New in version 0.9.0.