LZ77 finds sequences of bytes that occur multiple times, and stores them only once:
LZ77.encode (Encode.encode (Encode.string "aaaaa"))
--> [ Literal 97, Pointer 4 1 ]
The character a
occurs 5 times, which is encoded as:
a
)Note that the pointer tries to read 4 bytes, even though the output stream at that point only has length 1. This is fine: the elements are copied over one by one. The general concept behind this kind of compression is run-length encoding.
encode : Bytes -> Array Code
Encode using the LZ77 encoding
decode : Array Code -> Bytes
Decode using the LZ77 encoding
The codes
distance
positions back and read length
bytes. put the read bytes at the end of the output stream.encodeWithOptions : { windowSize : Basics.Int } -> Bytes -> Array Code
Encode using the LZ77 encoding, with additional options.
Pointer
can jump. A bigger window size gives better compression, but requires more data in memory.
That is almost never a problem nowadays though, so encode
uses the maximum window size that LZ77 supports.Note: decreasing the window size doesn't change the performance that much in elm. The bottleneck is in keeping track of matches in a large array, and the size of that array is constant.
maxWindowSize : Basics.Int
Maximum size of a sliding window.