allenap / elm-json-decode-broken / Json.Decode.Broken

Parse/decode broken JSON.

When reading these docs or the code in this module, it might be useful to refer to json.org (for the diagrams) or RFC 8259 (for the official word). The diagrams especially.

Parsing

When successful, parsing returns a Json.Encode.Value. Use with with Json.Decode.decodeValue to extract the information you need into your application's data structures.

parse : String -> Result (List Parser.DeadEnd) Json.Encode.Value

Parse the given JSON string.

This assumes a spec-compliant JSON string; it will choke on "broken" JSON. This seems kind of weird for a package that's all about parsing broken JSON. However, we all have to start somewhere. Read the code, copy it, modify it, make it work for your use case.

Errors come straight from elm/parser and may not be super useful. Sorry. I may switch to elm/parser's Parser.Advanced to improve this at some point.

Custom parsing

This isn't going to get you total flexibility, but using Config and co. will at least help you put together a consistent parser for JSON-like data. By consistent, I mean that you override, say, the number parser and it will be applied everywhere you might expect to see a number, be that at the top level, or nested within any depths of objects or arrays.

parseWith : Config -> String -> Result (List Parser.DeadEnd) Json.Encode.Value

Parse the given JSON string with a custom configuration.


type Config
    = Config ({ json : Config -> Parser Json.Encode.Value, value : Config -> Parser Json.Encode.Value, object : Config -> Parser Json.Encode.Value, array : Config -> Parser Json.Encode.Value, key : Parser String, string : Parser Json.Encode.Value, number : Parser Json.Encode.Value, true : Parser Json.Encode.Value, false : Parser Json.Encode.Value, null : Parser Json.Encode.Value, ws : Parser () })

Configuration for the parser.

defaultConfig : Config

Default configuration for the parser.

Top-level parsers

Using this module, a strict parser for a JSON value is defined by:

Parser.oneOf
    [ object defaultConfig
    , array defaultConfig
    , string
    , number
    , true
    , false
    , null
    ]

According to the specification, a JSON document is: optional whitespace, a JSON value (that oneOf … expression above), then more optional whitespace. That's what the json parser does. Hence parsing a compliant JSON document is:

Parser.run (json defaultConfig) "…"

Those component parsers are also exposed, as are several other sub-parsers. Use them as building blocks to compose a parser for broken JSON as you need. If you need to parse non-compliant quoted strings, for example, you might start by looking at stringLiteral. It might even be best to copy just the string code from this module into your project, and use the other parsers in this module – object, array, and so on – to compose a new parser by creating a new Config or deriving from defaultConfig.

json : Config -> Parser Json.Encode.Value

Parser for JSON.

This is a JSON value surrounded by optional whitespace.

Parsers for objects

object : Config -> Parser Json.Encode.Value

Parser for a JSON object.

key : Parser String

Parser for a JSON object key.

Parsers for arrays

array : Config -> Parser Json.Encode.Value

Parser for a JSON array.

Parsers for strings

string : Parser Json.Encode.Value

Parser for a quoted JSON string.

string and some of its helpers have been adapted from elm/parser's DoubleQuoteString example.

stringLiteral : Parser String -> Parser Char -> Parser String

Parser for a quoted JSON string literal.

This gives some flexibility over parsing unescaped string content and escape sequences. The literal must still start and end with ", but it's possible to change the rules for content to allow, for example, new-lines or carriage returns, or to process non-standard escape sequences.

One other difference from string is that this yields the actual String rather than a re-encoded Value. This is also used for object keys which need to be captured as String.

escape : Parser Char

Parser for an escape sequence.

This does not include the leading escape prefix, i.e. \\.

unicodeHexCode : Parser String

Parser for a Unicode hexadecimal code.

E.g. "AbCd" or "1234" or "000D".

It will match exactly 4 hex digits, case-insensitive.

Goes well with hexChar.

unescaped : Parser String

Parser for unescaped string contents.

The JSON specifications are specific about what characters are permissible in a quoted string. Perhaps most interestingly, horizontal tabs, new-lines, and carriage returns are not permitted; these must be escaped.

Parsers for numbers

number : Parser Json.Encode.Value

Parser for a JSON number.

int : Parser String

Parser for the integer portion of a JSON number.

123.456e+78
^^^

frac : Parser ()

Parser for an optional fractional portion of a JSON number.

123.456e+78
   ^^^^

exp : Parser ()

Parser for an optional exponent portion of a JSON number.

123.456e+78
       ^^^^

digit : Parser ()

Parser for a single decimal digit.

digits : Parser ()

Parser for one or more decimal digits.

This chomps characters; it does not yield them. Wrap with getChompedString to obtain the matched string.

digitsMaybe : Parser ()

Parser for zero or more decimal digits.

zero : Parser ()

Parser for a single decimal zero digit, 0.

oneNine : Parser ()

Parser for a single decimal digit between 1 and 9 inclusive.

Parsers for the others

true : Parser Json.Encode.Value

Parser for a JSON true literal.

false : Parser Json.Encode.Value

Parser for a JSON false literal.

null : Parser Json.Encode.Value

Parser for a JSON null literal.

ws : Parser ()

Parser for JSON whitespace.

This is the whitespace that appears between significant elements of JSON, and before and after JSON documents, not whitespace within quoted strings.

Useful functions

hexChar : String -> Char

Convert a Unicode hexadecimal code to a Char.

Useful with unicodeHexCode.

Note that ECMA 404 does not put a limit on the character ranges, i.e. it is permissible in JSON to specify a character for which Unicode does not have a character assignment. This leans on the behaviour of Char.fromCode to determine what happens for codes not covered by Unicode.

yields : a -> Parser b -> Parser a

Parser that, on success, always returns a

For example:

token "true" |> yields (Encode.bool True)

When the token true is matched, a boolean true value is yielded.