Parse Bytes
with custom error reporting and context tracking.
A parser which tracks a certain type of context, a certain type of error and produces a certain type of value.
run : Parser context error value -> Bytes -> Result (Error context error) value
Run the given parser on the provided bytes and the result.
import Bytes.Encode as E
import Bytes.Parser as P
E.string "hello"
|> E.encode
|> P.run (P.string 5)
--> Ok "hello"
E.string "hello"
|> E.encode
|> P.run (P.string 6)
--> Err (P.OutOfBounds { at = 0, bytes = 6 })
Describes errors that arise while parsing.
Custom errors happen through fail
, context tracking happens through
inContext
.
succeed : value -> Parser context error value
Always succeed with the given value.
import Bytes.Encode as E
import Bytes.Parser as P
E.encode (E.sequence [])
|> P.run (P.succeed "hi there")
--> Ok "hi there"
fail : error -> Parser context error value
A Parser that always fails with the given error.
import Bytes.Encode as E
import Bytes.Parser as P
type Error = SomeFailure
E.sequence []
|> E.encode
|> P.run (P.fail SomeFailure)
--> Err (P.Custom { at = 0 } SomeFailure)
Important note about using fail
in andThen
:
The offset the Custom
constructor of Error
is tagged with, is the offset the
parser is at when fail
is executed. When this happens inside and andThen
, be
aware that something was already read in order for there to be and andThen
in
the first place.
For example, consider this:
E.unsignedInt8 1
|> E.encode
|> P.run (P.andThen (\_ -> P.fail "fail") P.unsignedInt8)
--> Err (P.Custom { at = 1 } "fail")
We may have intended for the failure to be about the byte we just read, and
expect the offset to be "before" reading that byte. That's not quite what
andThen
means, though! andThen
means we parsed something successfully
already!
inContext : context -> Parser context error value -> Parser context error value
Add context to errors that may occur during parsing.
Adding context makes it easier to debug where issues occur.
import Bytes.Encode as E
import Bytes.Parser as P
type Context = Header | DataArea
E.sequence []
|> E.encode
|> P.run (P.inContext Header P.unsignedInt8)
--> Err
--> (P.InContext
--> { label = Header
--> , start = 0
--> }
--> (P.OutOfBounds { at = 0, bytes = 1})
--> )
unsignedInt8 : Parser context error Basics.Int
Parse one byte into an integer from 0 to 255.
unsignedInt16 : Bytes.Endianness -> Parser context error Basics.Int
Parse two bytes into an integer from 0 to 65535.
unsignedInt32 : Bytes.Endianness -> Parser context error Basics.Int
Parse four bytes into an integer from 0 to 4294967295.
signedInt8 : Parser context error Basics.Int
Parse one byte into an integer from -128 to 127.
signedInt16 : Bytes.Endianness -> Parser context error Basics.Int
Parse two bytes into an integer from -32768 to 32767.
signedInt32 : Bytes.Endianness -> Parser context error Basics.Int
Parse four bytes into an integer from -2147483648 to 2147483647.
float32 : Bytes.Endianness -> Parser context error Basics.Float
Parse 4 bytes into a Float.
float64 : Bytes.Endianness -> Parser context error Basics.Float
Parse 8 bytes into a Float.
string : Basics.Int -> Parser context error String
Parse count
bytes representing UTF-8 characters into a String.
Note that Elm strings use UTF-16. As a result, the String.length
will not
always agree with the number of bytes that went into it!
import Bytes.Encode as E
import Bytes.Parser as P
[ 0xF0, 0x9F, 0x91, 0x8D ]
|> List.map E.unsignedInt8
|> E.sequence
|> E.encode
|> P.run (P.string 4)
--> Ok "👍"
bytes : Basics.Int -> Parser context error Bytes
Parse count
bytes as Bytes
.
map : (a -> b) -> Parser context error a -> Parser context error b
Transform the value a parser produces
import Bytes.Encode as E
import Bytes.Parser as P
E.string "hello"
|> E.encode
|> P.run (P.map String.length (P.string 5))
--> Ok 5
map2 : (x -> y -> z) -> Parser context error x -> Parser context error y -> Parser context error z
Combine what 2 parsers produce into a single parser.
import Bytes exposing (Bytes)
import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)
input : Bytes
input =
[ E.unsignedInt8 3
, E.string "wat"
]
|> E.sequence
|> E.encode
map2Example : Parser c e String
map2Example =
P.map2 String.repeat P.unsignedInt8 (P.string 3)
P.run map2Example input
--> Ok "watwatwat"
Note that the effect of map2
(and, in fact, every map
variation) can also be
achieved using a combination of succeed
and keep
.
equivalent : Parser c e String
equivalent =
P.succeed String.repeat
|> P.keep P.unsignedInt8
|> P.keep (P.string 3)
P.run equivalent input
--> Ok "watwatwat"
map3 : (w -> x -> y -> z) -> Parser context error w -> Parser context error x -> Parser context error y -> Parser context error z
map4 : (v -> w -> x -> y -> z) -> Parser context error v -> Parser context error w -> Parser context error x -> Parser context error y -> Parser context error z
map5 : (u -> v -> w -> x -> y -> z) -> Parser context error u -> Parser context error v -> Parser context error w -> Parser context error x -> Parser context error y -> Parser context error z
keep : Parser context error a -> Parser context error (a -> b) -> Parser context error b
Keep the value produced by a parser in a pipeline.
Together with succeed
and ignore
, this allows writing
pretty flexible parsers in a straightforward manner: the order in which things
are parsed is apparent.
import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)
parser : Parser c e (Int, Int)
parser =
P.succeed Tuple.pair
|> P.keep P.unsignedInt8
|> P.ignore P.unsignedInt8
|> P.keep P.unsignedInt8
[ E.unsignedInt8 12
, E.unsignedInt8 3
, E.unsignedInt8 45
]
|> E.sequence
|> E.encode
|> P.run parser
--> Ok ( 12, 45 )
ignore : Parser context error ignore -> Parser context error keep -> Parser context error keep
Ignore the value produced by a parser.
Note that the parser must still succeed for the pipeline to succeed. This means you can use this for checking the value of something, without using the value.
import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)
type Error = Mismatch { expected : Int, actual : Int }
match : Int -> Parser c Error Int
match expected =
P.unsignedInt8
|> P.andThen
(\actual ->
if expected == actual then
P.succeed actual
else
P.fail (Mismatch { expected = expected, actual = actual})
)
parser : Parser c Error ()
parser =
P.succeed ()
|> P.ignore (match 66)
E.unsignedInt8 66
|> E.encode
|> P.run parser
--> Ok ()
E.unsignedInt8 44
|> E.encode
|> P.run parser
--> Mismatch { expected = 66, actual = 44 }
--> |> P.Custom { at = 1 }
--> |> Err
skip : Basics.Int -> Parser context error value -> Parser context error value
Skip a number of bytes in a pipeline.
This is similar to ignore
, but rather than parsing a value and discarding it,
this just goes ahead and skips them altogether.
andThen : (a -> Parser context error b) -> Parser context error a -> Parser context error b
Parse one thing, and then parse another thing based on the first thing.
This is very useful to make the content of your data drive your parser. As an example, consider a string encoded as the length of the string, followed by the actual data:
import Bytes.Encode as E
import Bytes.Parser as P exposing (Parser)
string : Parser c e String
string =
P.unsignedInt8 |> P.andThen P.string
[ E.unsignedInt8 5
, E.string "hello"
]
|> E.sequence
|> E.encode
|> P.run string
--> Ok "hello"
oneOf : List (Parser context error value) -> Parser context error value
Tries a bunch of parsers and succeeds with the first one to succeed.
Note that this uses backtracking when a parser fails after making some progress.
repeat : Parser context error value -> Basics.Int -> Parser context error (List value)
Repeat a given parser count
times.
The order of arguments is based on the common occurence of reading the number of times to repeat something through a parser.
import Bytes.Encode as E
import Bytes.Parser as P
intList : P.Parser c e (List Int)
intList =
P.unsignedInt8 |> P.andThen (P.repeat P.unsignedInt8)
[ 5, 0, 1, 2, 3, 4 ]
|> List.map E.unsignedInt8
|> E.sequence
|> E.encode
|> P.run intList
--> Ok [ 0, 1, 2, 3, 4 ]
Represent the next step of a loop: Either continue looping with some new internal state, or finish while producing a value.
loop : (state -> Parser context error (Step state a)) -> state -> Parser context error a
Loop a parser until it declares it is done looping.
The first argument is a function which, given some state, will usually parse some stuff and indicate it wants to either continue, or declare it is done and produce the final value. The second argument is the initial state for the loop.
This particular order of parameters was chosen to make it somewhat easier to
produce the initial state using a parser (which seems to be a fairly common use
case) and to hint at the mental model, which isn't unlike a fold
.
import Bytes.Encode as E
import Bytes.Parser as P
nullTerminatedString_ : (Int, P.Position) -> P.Parser c e (P.Step (Int, P.Position) String)
nullTerminatedString_ ( count, startPos ) =
P.unsignedInt8
|> P.andThen
(\byte ->
if byte == 0x00 then
P.string count
|> P.randomAccess { offset = 0, relativeTo = startPos }
|> P.map P.Done
else
P.succeed (P.Loop ( count + 1, startPos ))
)
nullTerminatedString : Parser c e String
nullTerminatedString =
P.map (Tuple.pair 0) P.position
|> P.andThen (P.loop nullTerminatedString_)
[ E.string "hello world!"
, E.unsignedInt8 0
]
|> E.sequence
|> E.encode
|> P.run nullTerminatedString
--> Ok "hello world!"
A concrete position in the input.
position : Parser context error Position
Produce the current offset in the input.
import Bytes.Encode as E
import Bytes.Parser as P
E.encode (E.string "hello")
|> P.run P.position
--> Ok P.startOfInput
parser : P.Parser c e P.Position
parser =
P.succeed identity
|> P.skip 2
|> P.keep P.position
E.encode (E.string "hello")
|> P.run parser
|> Result.map ((==) P.startOfInput)
--> Ok False
startOfInput : Position
Position signifying the start of input.
This is mostly useful when feeding absolute offsets to
randomAccess
.
randomAccess : { offset : Basics.Int, relativeTo : Position } -> Parser context error value -> Parser context error value
Read some data based on an offset.
This is meant for "out of band" reading - the resulting parser will resume reading where you left off.
As an example, consider we have some data like this:
Which can be represented like so:
import Bytes exposing (Bytes)
import Bytes.Encode as E
input : Bytes
input =
[ E.unsignedInt8 5 -- length of the string we're interested in
, E.unsignedInt8 15 -- absolute offset to the string
, E.unsignedInt8 6 -- another number we're interested in
, E.string (String.repeat 12 "\u{0000}") -- buffer. Its content is irrelevant.
, E.string "hello" -- our actual string
]
|> E.sequence
|> E.encode
Now, to decode this, let's first try decoding the String
by decoding the
length and offset, and then reading the data:
import Bytes.Parser as P exposing (Parser)
string : Parser c e String
string =
P.succeed Tuple.pair
|> P.keep P.unsignedInt8
|> P.keep P.unsignedInt8
|> P.andThen readStringWithLengthAndOffset
readStringWithLengthAndOffset : ( Int, Int ) -> Parser c e String
readStringWithLengthAndOffset ( length, offset ) =
P.randomAccess
{ offset = offset, relativeTo = P.startOfInput }
(P.string length)
P.run string input
--> Ok "hello"
Now, to illustrate the "resume" behaviour, let's use the above parser, and also read the interesting number:
final : Parser c e { string : String, number : Int }
final =
P.succeed (\s n -> { string = s, number = n })
|> P.keep string
|> P.keep P.unsignedInt8
P.run final input
--> Ok { string = "hello", number = 6 }
The trick here is that parsing continues its sequential behaviour, with the
randomAccess
parser running in a separate context.
If the offset isn't absolute, but relative, we can use a similar setup, with the
addition of specifying the position we want the offset to be relative to using
position
.
relativeString : Parser c e String
relativeString =
P.succeed readRelativeString
|> P.keep P.unsignedInt8
|> P.keep P.position
|> P.keep P.unsignedInt8
|> P.andThen identity
readRelativeString : Int -> P.Position -> Int -> Parser c e String
readRelativeString length marker offset =
P.randomAccess
{ offset = offset, relativeTo = marker }
(P.string length)