This is an implementation of the french version of the Porter stemmer algorithm taken from http://snowball.tartarus.org/algorithms/french/stemmer.html
The stemmer function is a drop in replacement for https://package.elm-lang.org/packages/rluiten/stemmer
when building the ElmSearchText
index is should be used like this:
ElmTextSearch.newWith
{ indexType = "ElmTextSearch - Customized French index"
, ref = ..
, fields = ..
, listFields = [..]
, initialTransformFactories = Index.Defaults.defaultInitialTransformFactories
, transformFactories = [ (\func index -> ( index, func )) (FrenchStemmer.stemmer True) ]
, filterFactories = [ createFilterFunc FrenchStemmer.frenchStopWords ]
}
stemmer : Basics.Bool -> String -> String
Runs the stemmer algorithm and returns the stem of the word. The Bool parameter allows for the removal of articles or grammatical particles prefixed to the word and linked to it by an apostrophe.
stemmer True "documentation" == "document"
stemmer True "l'ail" == "ail"
stemmer False "l'ail" == "ail"
debugStemmer : String -> Record
Runs the stemmer algorithm and gives a detailed output containing information about the word subdivisions used by the algorithm and the steps realised.
debugStemmer "documentation" == "{ currentR1 = " ument ", currentR2 = " ent ", currentRV = " cument ", input = " document ", lastEffective = Just Step1, realised = [Step6,Step5,Step3,Step1] }"
frenchStopWords : List String
A list of french stop words to be ignored while indexing a document.