CoordinateMatrix

class pyspark.mllib.linalg.distributed.CoordinateMatrix(entries, numRows=0, numCols=0)[source]

Represents a matrix in coordinate format.

Parameters
  • entries – An RDD of MatrixEntry inputs or (long, long, float) tuples.

  • numRows – Number of rows in the matrix. A non-positive value means unknown, at which point the number of rows will be determined by the max row index plus one.

  • numCols – Number of columns in the matrix. A non-positive value means unknown, at which point the number of columns will be determined by the max row index plus one.

Methods

Attributes

Methods Documentation

numCols()[source]

Get or compute the number of cols.

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(1, 0, 2),
...                           MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries)
>>> print(mat.numCols())
2
>>> mat = CoordinateMatrix(entries, 7, 6)
>>> print(mat.numCols())
6
numRows()[source]

Get or compute the number of rows.

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(1, 0, 2),
...                           MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries)
>>> print(mat.numRows())
3
>>> mat = CoordinateMatrix(entries, 7, 6)
>>> print(mat.numRows())
7
toBlockMatrix(rowsPerBlock=1024, colsPerBlock=1024)[source]

Convert this matrix to a BlockMatrix.

Parameters
  • rowsPerBlock – Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows.

  • colsPerBlock – Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns.

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(6, 4, 2.1)])
>>> mat = CoordinateMatrix(entries).toBlockMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to
>>> # the highest row index being 6, and the ensuing
>>> # BlockMatrix will have 7 rows as well.
>>> print(mat.numRows())
7
>>> # This CoordinateMatrix will have 5 columns, due to the
>>> # highest column index being 4, and the ensuing
>>> # BlockMatrix will have 5 columns as well.
>>> print(mat.numCols())
5
toIndexedRowMatrix()[source]

Convert this matrix to an IndexedRowMatrix.

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(6, 4, 2.1)])
>>> mat = CoordinateMatrix(entries).toIndexedRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to
>>> # the highest row index being 6, and the ensuing
>>> # IndexedRowMatrix will have 7 rows as well.
>>> print(mat.numRows())
7
>>> # This CoordinateMatrix will have 5 columns, due to the
>>> # highest column index being 4, and the ensuing
>>> # IndexedRowMatrix will have 5 columns as well.
>>> print(mat.numCols())
5
toRowMatrix()[source]

Convert this matrix to a RowMatrix.

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(6, 4, 2.1)])
>>> mat = CoordinateMatrix(entries).toRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to
>>> # the highest row index being 6, but the ensuing RowMatrix
>>> # will only have 2 rows since there are only entries on 2
>>> # unique rows.
>>> print(mat.numRows())
2
>>> # This CoordinateMatrix will have 5 columns, due to the
>>> # highest column index being 4, and the ensuing RowMatrix
>>> # will have 5 columns as well.
>>> print(mat.numCols())
5
transpose()[source]

Transpose this CoordinateMatrix.

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(1, 0, 2),
...                           MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries)
>>> mat_transposed = mat.transpose()
>>> print(mat_transposed.numRows())
2
>>> print(mat_transposed.numCols())
3

New in version 2.0.0.

Attributes Documentation

entries

Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.

>>> mat = CoordinateMatrix(sc.parallelize([MatrixEntry(0, 0, 1.2),
...                                        MatrixEntry(6, 4, 2.1)]))
>>> entries = mat.entries
>>> entries.first()
MatrixEntry(0, 0, 1.2)