Extracts text from HTML.
Block-level elements such as div are surrounded with whitespace,
but inline elements are not. Span is treated as a block level element
because it is often used as a container.
Breaking spaces are compressed and trimmed.
|
code » |
![]()
Matches all tags, HTML comments, and DOCTYPEs in tag soup HTML.
By removing these, and replacing any '<' or '>' characters with
entities we guarantee that the result can be embedded into
an attribute without introducing a tag boundary.
|
Code » | |
![]()
Matches all tags that do not require extra space.
|
Code » |