Leniently parse html5 documents and fragments and then render them into strings or Elm's virtual dom nodes.
An html node is tree of text, comments, and element nodes.
An element (e.g. <div foo="bar">hello</div>
) can have attributes and child nodes.
{ legacyCompat : Basics.Bool
, root : Node
}
An html document has a <!doctype>
and then a root html node.
Configure the parser. Use the config constructors to create a config object.
allCharRefs : Config
A config with char reference decoding turned on.
This will add ~40kb to your bundle, but it is necessary to decode
entities like "Δ"
into "Δ".
run allCharRefs "abcΔdef"
== Ok [ text "abcΔdef" ]
noCharRefs : Config
A config with char reference decoding turned off.
If you know that the html you are parsing never has named character references, or if it's sufficient to just consume them as undecoded text, then turning this off will shrink your bundle size.
run noCharRefs "abcΔdef"
== Ok [ text "abcΔdef" ]
customCharRefs : Dict String String -> Config
Provide your own character reference lookup dictionary.
Note that named character references are case sensitive. When providing your own,
you will want to consult the exhaustive Html.CharRefs.all
dictionary to
see which keys appear multiple times, like "quot" and "QUOT".
Here is an example of providing a small subset of commonly-seen character references.
config : Html.Parser.Config
config =
[ ( "quot", "\"" )
, ( "QUOT", "\"" )
, ( "apos", "'" )
, ( "gt", ">" )
, ( "GT", ">" )
, ( "Gt", ">" )
, ( "lt", "<" )
, ( "LT", "<" )
, ( "Lt", "<" )
, ( "amp", "&" )
, ( "AMP", "&" )
, ( "nbsp", "\u{00A0}" )
]
|> Dict.fromList
|> customCharRefs
run config "<span>♂ & ♀</span>"
== Ok (Element "span" [] [Text "♂ & ♀"])
Notice that character references missing from the lookup table are simply parsed as text.
run : Config -> String -> Result (List Parser.DeadEnd) (List Node)
Parse an html fragment into a list of html nodes.
The html fragment can have multiple top-level nodes.
run allCharRefs "<div>hi</div><div>bye</div>"
== Ok
[ Element "div" [] [ Text "hi" ]
, Element "div" [] [ Text "bye" ]
]
runElement : Config -> String -> Result (List Parser.DeadEnd) Node
Like run
except it only parses one top-level element and it always returns a single node.
runDocument : Config -> String -> Result (List Parser.DeadEnd) Document
Parses <!doctype html>
and any html nodes after.
Always returns a single root node. Wraps nodes in a root <html>
node if one is not present.
Caveat: If there are multiple top-level nodes and one of them is <html>
, then this
function will wrap them all in another <html>
node.
nodeToHtml : Node -> Html msg
Turn a single node into an Elm html node that Elm can render.
nodesToHtml : List Node -> List (Html msg)
Turn a multiple html nodes into Elm html that Elm can render.
view : Html Msg
view =
Html.div
[]
("<p>hello world</p>"
|> Html.Parser.run Html.Parser.allCharRefs
|> Result.map Html.Parser.nodesToHtml
|> Result.withDefault [ Html.text "parse error" ]
)
nodeToString : Node -> String
Convert an html node into a non-pretty string.
nodeToString (Element "a" [] [ Text "hi" ])
== "<a>hi</a>"
nodesToString : List Node -> String
Convert multiple html nodes into a non-pretty string.
nodesToString
[ Element "a" [] [ Text "hi" ]
, Element "div" [] [ Element "span" [] [] ]
]
== "<a>hi</a><div><span></span></div>"
nodeToPrettyString : Node -> String
Generate a pretty string for a single html node.
nodesToPrettyString : List Node -> String
Turn a node tree into a pretty-printed, indented html string.
("<a><b><c>hello</c></b></a>"
|> Html.Parser.run Html.Parser.allCharRefs
|> Result.map nodesToPrettyString
)
== Ok """<a>
<b>
<c>
hello
</c>
</b>
</a>"""
documentToString : Document -> String
Convert a document into a string starting with <!doctype html>
followed by the root html node.
documentToPrettyString : Document -> String
Convert a document into a pretty, indented string.