Normalizer

class pyspark.mllib.feature.Normalizer(p=2.0)[source]

Normalizes samples individually to unit Lp norm

For any 1 <= p < float(‘inf’), normalizes samples using sum(abs(vector) p) (1/p) as norm.

For p = float(‘inf’), max(abs(vector)) will be used as norm for normalization.

Parameters

p – Normalization in L^p^ space, p = 2 by default.

>>> v = Vectors.dense(range(3))
>>> nor = Normalizer(1)
>>> nor.transform(v)
DenseVector([0.0, 0.3333, 0.6667])
>>> rdd = sc.parallelize([v])
>>> nor.transform(rdd).collect()
[DenseVector([0.0, 0.3333, 0.6667])]
>>> nor2 = Normalizer(float("inf"))
>>> nor2.transform(v)
DenseVector([0.0, 0.5, 1.0])

New in version 1.2.0.

Methods

Methods Documentation

transform(vector)[source]

Applies unit length normalization on a vector.

Parameters

vector – vector or RDD of vector to be normalized.

Returns

normalized vector. If the norm of the input is zero, it will return the input vector.

New in version 1.2.0.