eniac314 / french-stemmer / FrenchStemmer

This is an implementation of the french version of the Porter stemmer algorithm taken from http://snowball.tartarus.org/algorithms/french/stemmer.html

The stemmer function is a drop in replacement for https://package.elm-lang.org/packages/rluiten/stemmer

when building the ElmSearchText index is should be used like this:

ElmTextSearch.newWith
    { indexType = "ElmTextSearch - Customized French index"
    , ref = ..
    , fields = ..
    , listFields = [..]
    , initialTransformFactories = Index.Defaults.defaultInitialTransformFactories
    , transformFactories = [ (\func index -> ( index, func )) (FrenchStemmer.stemmer True) ]
    , filterFactories = [ createFilterFunc FrenchStemmer.frenchStopWords ]
    }

Usage

stemmer : Basics.Bool -> String -> String

Runs the stemmer algorithm and returns the stem of the word. The Bool parameter allows for the removal of articles or grammatical particles prefixed to the word and linked to it by an apostrophe.

stemmer True "documentation" == "document"

stemmer True "l'ail" == "ail"

stemmer False "l'ail" == "ail"

debugStemmer : String -> Record

Runs the stemmer algorithm and gives a detailed output containing information about the word subdivisions used by the algorithm and the steps realised.

debugStemmer "documentation" == "{ currentR1 = " ument ", currentR2 = " ent ", currentRV = " cument ", input = " document ", lastEffective = Just Step1, realised = [Step6,Step5,Step3,Step1] }"

frenchStopWords : List String

A list of french stop words to be ignored while indexing a document.

Source: https://github.com/stopwords-iso/stopwords-fr