IsotonicRegression

class pyspark.mllib.regression.IsotonicRegression[source]

Isotonic regression. Currently implemented using parallelized pool adjacent violators algorithm. Only univariate (single feature) algorithm supported.

Sequential PAV implementation based on:

Tibshirani, Ryan J., Holger Hoefling, and Robert Tibshirani. “Nearly-isotonic regression.” Technometrics 53.1 (2011): 54-61. Available from http://www.stat.cmu.edu/~ryantibs/papers/neariso.pdf

Sequential PAV parallelization based on:

Kearsley, Anthony J., Richard A. Tapia, and Michael W. Trosset. “An approach to parallelizing isotonic regression.” Applied Mathematics and Parallel Computing. Physica-Verlag HD, 1996. 141-147. Available from http://softlib.rice.edu/pub/CRPC-TRs/reports/CRPC-TR96640.pdf

See Isotonic regression (Wikipedia).

New in version 1.4.0.

Methods

Methods Documentation

classmethod train(data, isotonic=True)[source]

Train an isotonic regression model on the given data.

Parameters
  • data – RDD of (label, feature, weight) tuples.

  • isotonic – Whether this is isotonic (which is default) or antitonic. (default: True)

New in version 1.4.0.