hecrj / html-parser / Html.Parser

Parse HTML 5 in Elm. See https://www.w3.org/TR/html5/syntax.html

run : String -> Result (List Parser.DeadEnd) (List Node)

Run the parser!

run "<div><p>Hello, world!</p></div>"
-- => Ok [ Element "div" [] [ Element "p" [] [ Text "Hello, world!" ] ] ]

runDocument : String -> Result (List Parser.DeadEnd) Document

Run the parser on an entire HTML document

runDocument "<!--First comment--><!DOCTYPE html><!--Test stuffs--><html></html><!--Footer comment!-->"
-- => Ok { preambleComments = ["First comment"], doctype = "", predocComments = ["Test stuffs"], document = ([],[]), postdocComments = ["Footer comment!"] }


type Node
    = Text String
    | Element String (List Attribute) (List Node)
    | Comment String

An HTML node. It can either be:


type alias Document =
{ preambleComments : List String
, doctype : String
, predocComments : List String
, document : ( List Attribute
, List Node )
, postdocComments : List String 
}

An HTML document.

This simply separates the document into its component parts, as defined by the WHATWG Standard


type alias Attribute =
( String, String )

An HTML attribute. For instance:

( "href", "https://elm-lang.org" )

Internals

If you are building a parser of your own using elm/parser and you need to parse HTML... This section is for you!

node : Parser Node

Parse an HTML node.

You can use this in your own parser to add support for HTML 5.

nodeToString : Node -> String

Turn a parser node back into its HTML string.

For instance:

Element "a"
    [ ( "href", "https://elm-lang.org" ) ]
    [ Text "Elm" ]
    |> nodeToString

Produces <a href="https://elm-lang.org">Elm</a>.

documentToString : Document -> String

Turn a document back into its HTML string.

For instance:

{ preambleComments = [ "Early!" ]
, doctype = "LEGACY \"My legacy string stuff\""
, predocComments = [ "Teehee!" ]
, document = ( [], [ Element "p" [] [ Text "Got it." ], Element "br" [] [] ] )
, postdocComments = [ "Smelly feet" ]
}
    |> nodeToString

Produces <!--Early!--><!DOCTYPE html LEGACY \"My legacy string stuff\"><!--Teehee!--><html><p>Got it.</p><br></html><!--Smelly feet-->.