folkertdev / elm-flate / LZ77

LZ77 finds sequences of bytes that occur multiple times, and stores them only once:

LZ77.encode (Encode.encode (Encode.string "aaaaa"))
    --> [ Literal 97, Pointer 4 1 ]

The character a occurs 5 times, which is encoded as:

Note that the pointer tries to read 4 bytes, even though the output stream at that point only has length 1. This is fine: the elements are copied over one by one. The general concept behind this kind of compression is run-length encoding.

encode : Bytes -> Array Code

Encode using the LZ77 encoding

decode : Array Code -> Bytes

Decode using the LZ77 encoding


type Code
    = Literal Basics.Int
    | Pointer Basics.Int Basics.Int

The codes

encodeWithOptions : { windowSize : Basics.Int } -> Bytes -> Array Code

Encode using the LZ77 encoding, with additional options.

Note: decreasing the window size doesn't change the performance that much in elm. The bottleneck is in keeping track of matches in a large array, and the size of that array is constant.

maxWindowSize : Basics.Int

Maximum size of a sliding window.