zwilias / elm-bytes-parser / Bytes.Parser

Parse Bytes with custom error reporting and context tracking.

Running parsers


type Parser context error value

A parser which tracks a certain type of context, a certain type of error and produces a certain type of value.

run : Parser context error value -> Bytes -> Result (Error context error) value

Run the given parser on the provided bytes and the result.

import Bytes.Encode as E
import Bytes.Parser as P


E.string "hello"
    |> E.encode
    |> P.run (P.string 5)
--> Ok "hello"


E.string "hello"
    |> E.encode
    |> P.run (P.string 6)
--> Err (P.OutOfBounds { at = 0, bytes = 6 })


type Error context error
    = InContext ({ label : context, start : Basics.Int }) (Error context error)
    | OutOfBounds ({ at : Basics.Int, bytes : Basics.Int })
    | Custom ({ at : Basics.Int }) error
    | BadOneOf ({ at : Basics.Int }) (List (Error context error))

Describes errors that arise while parsing.

Custom errors happen through fail, context tracking happens through inContext.

Static parsers

succeed : value -> Parser context error value

Always succeed with the given value.

import Bytes.Encode as E
import Bytes.Parser as P


E.encode (E.sequence [])
    |> P.run (P.succeed "hi there")
--> Ok "hi there"

fail : error -> Parser context error value

A Parser that always fails with the given error.

import Bytes.Encode as E
import Bytes.Parser as P


type Error = SomeFailure


E.sequence []
    |> E.encode
    |> P.run (P.fail SomeFailure)
--> Err (P.Custom { at = 0 } SomeFailure)

Important note about using fail in andThen:

The offset the Custom constructor of Error is tagged with, is the offset the parser is at when fail is executed. When this happens inside and andThen, be aware that something was already read in order for there to be and andThen in the first place.

For example, consider this:

E.unsignedInt8 1
    |> E.encode
    |> P.run (P.andThen (\_ -> P.fail "fail") P.unsignedInt8)
--> Err (P.Custom { at = 1 } "fail")

We may have intended for the failure to be about the byte we just read, and expect the offset to be "before" reading that byte. That's not quite what andThen means, though! andThen means we parsed something successfully already!

inContext : context -> Parser context error value -> Parser context error value

Add context to errors that may occur during parsing.

Adding context makes it easier to debug where issues occur.

import Bytes.Encode as E
import Bytes.Parser as P


type Context = Header | DataArea


E.sequence []
    |> E.encode
    |> P.run (P.inContext Header P.unsignedInt8)
--> Err
-->    (P.InContext
-->        { label = Header
-->        , start = 0
-->        }
-->        (P.OutOfBounds { at = 0, bytes = 1})
-->    )

Basic parsers

Integers

unsignedInt8 : Parser context error Basics.Int

Parse one byte into an integer from 0 to 255.

unsignedInt16 : Bytes.Endianness -> Parser context error Basics.Int

Parse two bytes into an integer from 0 to 65535.

unsignedInt32 : Bytes.Endianness -> Parser context error Basics.Int

Parse four bytes into an integer from 0 to 4294967295.

signedInt8 : Parser context error Basics.Int

Parse one byte into an integer from -128 to 127.

signedInt16 : Bytes.Endianness -> Parser context error Basics.Int

Parse two bytes into an integer from -32768 to 32767.

signedInt32 : Bytes.Endianness -> Parser context error Basics.Int

Parse four bytes into an integer from -2147483648 to 2147483647.

Floats

float32 : Bytes.Endianness -> Parser context error Basics.Float

Parse 4 bytes into a Float.

float64 : Bytes.Endianness -> Parser context error Basics.Float

Parse 8 bytes into a Float.

Strings

string : Basics.Int -> Parser context error String

Parse count bytes representing UTF-8 characters into a String.

Note that Elm strings use UTF-16. As a result, the String.length will not always agree with the number of bytes that went into it!

import Bytes.Encode as E
import Bytes.Parser as P


[ 0xF0, 0x9F, 0x91, 0x8D ]
    |> List.map E.unsignedInt8
    |> E.sequence
    |> E.encode
    |> P.run (P.string 4)
--> Ok "👍"

Bytes

bytes : Basics.Int -> Parser context error Bytes

Parse count bytes as Bytes.

Transforming values

map : (a -> b) -> Parser context error a -> Parser context error b

Transform the value a parser produces

import Bytes.Encode as E
import Bytes.Parser as P


E.string "hello"
    |> E.encode
    |> P.run (P.map String.length (P.string 5))
--> Ok 5

map2 : (x -> y -> z) -> Parser context error x -> Parser context error y -> Parser context error z

Combine what 2 parsers produce into a single parser.

import Bytes exposing (Bytes)
import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)


input : Bytes
input =
    [ E.unsignedInt8 3
    , E.string "wat"
    ]
        |> E.sequence
        |> E.encode


map2Example : Parser c e String
map2Example =
    P.map2 String.repeat P.unsignedInt8 (P.string 3)


P.run map2Example input
--> Ok "watwatwat"

Note that the effect of map2 (and, in fact, every map variation) can also be achieved using a combination of succeed and keep.

equivalent : Parser c e String
equivalent =
    P.succeed String.repeat
        |> P.keep P.unsignedInt8
        |> P.keep (P.string 3)

P.run equivalent input
--> Ok "watwatwat"

map3 : (w -> x -> y -> z) -> Parser context error w -> Parser context error x -> Parser context error y -> Parser context error z

map4 : (v -> w -> x -> y -> z) -> Parser context error v -> Parser context error w -> Parser context error x -> Parser context error y -> Parser context error z

map5 : (u -> v -> w -> x -> y -> z) -> Parser context error u -> Parser context error v -> Parser context error w -> Parser context error x -> Parser context error y -> Parser context error z

Combininig parsers

keep : Parser context error a -> Parser context error (a -> b) -> Parser context error b

Keep the value produced by a parser in a pipeline.

Together with succeed and ignore, this allows writing pretty flexible parsers in a straightforward manner: the order in which things are parsed is apparent.

import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)

parser : Parser c e (Int, Int)
parser =
    P.succeed Tuple.pair
        |> P.keep P.unsignedInt8
        |> P.ignore P.unsignedInt8
        |> P.keep P.unsignedInt8

[ E.unsignedInt8 12
, E.unsignedInt8 3
, E.unsignedInt8 45
]
    |> E.sequence
    |> E.encode
    |> P.run parser
--> Ok ( 12, 45 )

ignore : Parser context error ignore -> Parser context error keep -> Parser context error keep

Ignore the value produced by a parser.

Note that the parser must still succeed for the pipeline to succeed. This means you can use this for checking the value of something, without using the value.

import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)


type Error = Mismatch { expected : Int, actual : Int }


match : Int -> Parser c Error Int
match expected =
    P.unsignedInt8
        |> P.andThen
            (\actual ->
                if expected == actual then
                    P.succeed actual
                else
                    P.fail (Mismatch { expected = expected, actual = actual})
            )

parser : Parser c Error ()
parser =
    P.succeed ()
        |> P.ignore (match 66)


E.unsignedInt8 66
    |> E.encode
    |> P.run parser
--> Ok ()


E.unsignedInt8 44
    |> E.encode
    |> P.run parser
--> Mismatch { expected = 66, actual = 44 }
-->   |> P.Custom { at = 1 }
-->   |> Err

skip : Basics.Int -> Parser context error value -> Parser context error value

Skip a number of bytes in a pipeline.

This is similar to ignore, but rather than parsing a value and discarding it, this just goes ahead and skips them altogether.

Fancy parsers

andThen : (a -> Parser context error b) -> Parser context error a -> Parser context error b

Parse one thing, and then parse another thing based on the first thing.

This is very useful to make the content of your data drive your parser. As an example, consider a string encoded as the length of the string, followed by the actual data:

import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)


string : Parser c e String
string =
    P.unsignedInt8 |> P.andThen P.string


[ E.unsignedInt8 5
, E.string "hello"
]
    |> E.sequence
    |> E.encode
    |> P.run string
--> Ok "hello"

oneOf : List (Parser context error value) -> Parser context error value

Tries a bunch of parsers and succeeds with the first one to succeed.

Note that this uses backtracking when a parser fails after making some progress.

repeat : Parser context error value -> Basics.Int -> Parser context error (List value)

Repeat a given parser count times.

The order of arguments is based on the common occurence of reading the number of times to repeat something through a parser.

import Bytes.Encode as E
import Bytes.Parser as P


intList : P.Parser c e (List Int)
intList =
    P.unsignedInt8 |> P.andThen (P.repeat P.unsignedInt8)


[ 5, 0, 1, 2, 3, 4 ]
    |> List.map E.unsignedInt8
    |> E.sequence
    |> E.encode
    |> P.run intList
--> Ok [ 0, 1, 2, 3, 4 ]


type Step state a
    = Loop state
    | Done a

Represent the next step of a loop: Either continue looping with some new internal state, or finish while producing a value.

loop : (state -> Parser context error (Step state a)) -> state -> Parser context error a

Loop a parser until it declares it is done looping.

The first argument is a function which, given some state, will usually parse some stuff and indicate it wants to either continue, or declare it is done and produce the final value. The second argument is the initial state for the loop.

This particular order of parameters was chosen to make it somewhat easier to produce the initial state using a parser (which seems to be a fairly common use case) and to hint at the mental model, which isn't unlike a fold.

import Bytes.Encode as E
import Bytes.Parser as P

nullTerminatedString_ : (Int, P.Position) -> P.Parser c e (P.Step (Int, P.Position) String)
nullTerminatedString_ ( count, startPos ) =
    P.unsignedInt8
        |> P.andThen
            (\byte ->
                 if byte == 0x00 then
                     P.string count
                         |> P.randomAccess { offset = 0, relativeTo = startPos }
                         |> P.map P.Done
                 else
                     P.succeed (P.Loop ( count + 1, startPos ))
            )

nullTerminatedString : Parser c e String
nullTerminatedString =
    P.map (Tuple.pair 0) P.position
        |> P.andThen (P.loop nullTerminatedString_)


[ E.string "hello world!"
, E.unsignedInt8 0
]
    |> E.sequence
    |> E.encode
    |> P.run nullTerminatedString
--> Ok "hello world!"

Random access


type Position

A concrete position in the input.

position : Parser context error Position

Produce the current offset in the input.

import Bytes.Encode as E
import Bytes.Parser as P


E.encode (E.string "hello")
    |> P.run P.position
--> Ok P.startOfInput


parser : P.Parser c e P.Position
parser =
    P.succeed identity
        |> P.skip 2
        |> P.keep P.position


E.encode (E.string "hello")
    |> P.run parser
    |> Result.map ((==) P.startOfInput)
--> Ok False

startOfInput : Position

Position signifying the start of input.

This is mostly useful when feeding absolute offsets to randomAccess.

randomAccess : { offset : Basics.Int, relativeTo : Position } -> Parser context error value -> Parser context error value

Read some data based on an offset.

This is meant for "out of band" reading - the resulting parser will resume reading where you left off.

As an example, consider we have some data like this:

Which can be represented like so:

import Bytes exposing (Bytes)
import Bytes.Encode as E

input : Bytes
input =
    [ E.unsignedInt8 5 -- length of the string we're interested in
    , E.unsignedInt8 15 -- absolute offset to the string
    , E.unsignedInt8 6 -- another number we're interested in
    , E.string (String.repeat 12 "\u{0000}") -- buffer. Its content is irrelevant.
    , E.string "hello" -- our actual string
    ]
        |> E.sequence
        |> E.encode

Now, to decode this, let's first try decoding the String by decoding the length and offset, and then reading the data:

import Bytes.Parser as P exposing (Parser)


string : Parser c e String
string =
    P.succeed Tuple.pair
        |> P.keep P.unsignedInt8
        |> P.keep P.unsignedInt8
        |> P.andThen readStringWithLengthAndOffset


readStringWithLengthAndOffset : ( Int, Int ) -> Parser c e String
readStringWithLengthAndOffset ( length, offset ) =
    P.randomAccess
        { offset = offset, relativeTo = P.startOfInput }
        (P.string length)


P.run string input
--> Ok "hello"

Now, to illustrate the "resume" behaviour, let's use the above parser, and also read the interesting number:

final : Parser c e { string : String, number : Int }
final =
    P.succeed (\s n -> { string = s, number = n })
        |> P.keep string
        |> P.keep P.unsignedInt8


P.run final input
--> Ok { string = "hello", number = 6 }

The trick here is that parsing continues its sequential behaviour, with the randomAccess parser running in a separate context.

If the offset isn't absolute, but relative, we can use a similar setup, with the addition of specifying the position we want the offset to be relative to using position.

relativeString : Parser c e String
relativeString =
    P.succeed readRelativeString
        |> P.keep P.unsignedInt8
        |> P.keep P.position
        |> P.keep P.unsignedInt8
        |> P.andThen identity

readRelativeString : Int -> P.Position -> Int -> Parser c e String
readRelativeString length marker offset =
    P.randomAccess
        { offset = offset, relativeTo = marker }
        (P.string length)