PrefixSpan¶
-
class
pyspark.mllib.fpm.
PrefixSpan
[source]¶ A parallel PrefixSpan algorithm to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth ([[https://doi.org/10.1109/ICDE.2001.914830]]).
New in version 1.6.0.
Methods
Methods Documentation
-
classmethod
train
(data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=32000000)[source]¶ Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
- Parameters
data – The input data set, each element contains a sequence of itemsets.
minSupport – The minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output. (default: 0.1)
maxPatternLength – The maximal length of the sequential pattern, any pattern that appears less than maxPatternLength will be output. (default: 10)
maxLocalProjDBSize – The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run. (default: 32000000)
New in version 1.6.0.
-
classmethod