labs.html.scrubber

Classes


Public Protected Private

Enumerations

goog.labs.html.scrubber.Group_ :
Groups of elements used to specify containment relationships.
Constants:
AREA_ELEMENT_
No description.
BLOCK_
No description.
CHARACTER_DATA_
No description.
COL_ELEMENT_
No description.
DL_PART_
No description.
FORM_ELEMENT_
No description.
HEAD_CONTENT_
No description.
INLINE_
No description.
INLINE_MINUS_A_
No description.
LEGEND_ELEMENT_
No description.
LI_ELEMENT_
No description.
MIXED_
No description.
OPTIONS_ELEMENT_
No description.
OPTION_ELEMENT_
No description.
PARAM_ELEMENT_
No description.
P_ELEMENT_
No description.
TABLE_CONTENT_
No description.
TABLE_ELEMENT_
No description.
TD_ELEMENT_
No description.
TOP_CONTENT_
No description.
TR_ELEMENT_
No description.
Code »
goog.labs.html.scrubber.Scope_ :
Element scopes limit where close tags can have effects. For example, a table cannot be implicitly closed by a </p> even if the table appears inside a <p> because the <table> element introduces a scope.
Constants:
BUTTON_
No description.
COMMON_
No description.
LIST_ITEM_
No description.
TABLE_
No description.
Code »

Global Functions

goog.labs.html.scrubber.balance(html) string
Balances tags in trusted HTML.
Arguments:
html : string
a string of HTML
Returns: string  the input but with an end-tag for each non-void start tag and only for non-void start tags, and with start and end tags nesting properly.
code »
goog.labs.html.scrubber.balance_(htmlTokens) !Array.<string>
Ensures that there are end-tags for all and only for non-void start tags.
Arguments:
htmlTokens : Array.<string>
an array of HTML tokens as returned by goog.labs.html.scrubber.lex.
Returns: !Array.<string>  the input array modified in place to have some tokens removed.
code »
goog.labs.html.scrubber.filterAttrs_(attrsTextgenericAttrWhitelisttagSpecificAttrWhitelist) string
Parses attribute names and values out of a tag body and applies the attribute white-list to produce a tag body containing only safe attributes.
Arguments:
attrsText : string
the text of a tag between the end of the tag name and the beginning of the tag end marker, so " foo bar='baz'" for the tag <tag foo bar='baz'/>.
genericAttrWhitelist : Object.<string, goog.labs.html.AttributeRewriter>
a whitelist of attribute transformations for attributes that are allowed on any element.
tagSpecificAttrWhitelist : Object.<string, goog.labs.html.AttributeRewriter>
a whitelist of attribute transformations for attributes that are allowed on the element started by the tag whose body is tagBody.
Returns: string  a tag-body that consists only of safe attributes.
code »
goog.labs.html.scrubber.filter_(tagWhitelistattrWhitelisthtmlTokens) !Array.<string>
Replaces tags not on the white-list with empty text nodes, dropping all attributes, and drops other non-text nodes such as comments.
Arguments:
tagWhitelist : !Object.<string, boolean>
a set of lower-case tag names following the convention established by goog.object.createSet.
attrWhitelist : !Object.<string, Object.<strin>, goog.labs.html.AttributeRewriter>> >
maps lower-case tag names and the special string "*" to functions from decoded attribute values to sanitized values or null to indicate that the attribute is not allowed with that value. For example, if attrWhitelist['a']['href'] is defined then it is used to sanitize the value of the link's URL. If attrWhitelist['*']['id'] is defined, and attrWhitelist['div']['id'] is not, then the former is used to sanitize any id attribute on a <div> element.
htmlTokens : !Array.<string>
an array of HTML tokens as returned by goog.labs.html.scrubber.lex_.
Returns: !Array.<string>  the input array modified in place to have some tokens removed.
code »
goog.labs.html.scrubber.lex_(html) !Array.<string>
Returns an array of HTML tokens including tags, text nodes and comments. "Special" elements, like <script>...</script> whose bodies cannot include nested elements, are returned as single tokens.
Arguments:
html : string
a string of HTML
Returns: !Array.<stringNo description.
code »
goog.labs.html.scrubber.pickElementsToClose_(lowerCaseTagNameisCloseTagopenElementStack) number
Picks which open HTML elements to close.
Arguments:
lowerCaseTagName : string
The name of the tag.
isCloseTag : boolean
True for a </tagname> tag.
openElementStack : Array.<string>
The names of elements that have been opened and not subsequently closed.
Returns: number  the length of openElementStack after closing any tags that need to be closed.
code »
goog.labs.html.scrubber.readOwnProperty_(ok) *
No description.
Arguments:
o : !Object
the object
k : !string
a key into o
Returns: *  No description.
code »
goog.labs.html.scrubber.render_(htmlTokens) string
Normalizes HTML tokens and concatenates them into a string.
Arguments:
htmlTokens : Array.<string>
an array of HTML tokens as returned by goog.labs.html.scrubber.lex.
Returns: string  a string of HTML.
code »
goog.labs.html.scrubber.scrub(tagWhitelistattrWhitelisthtml) string
Replaces tags not on the white-list with empty text nodes, dropping all attributes, and drops other non-text nodes such as comments.
Arguments:
tagWhitelist : !Object.<string, boolean>
a set of lower-case tag names following the convention established by goog.object.createSet.
attrWhitelist : !Object.<string, Object.<strin>, goog.labs.html.AttributeRewriter>>
maps lower-case tag names and the special string "*" to functions from decoded attribute values to sanitized values or null to indicate that the attribute is not allowed with that value. For example, if attrWhitelist['a']['href'] is defined then it is used to sanitize the value of the link's URL. If attrWhitelist['*']['id'] is defined, and attrWhitelist['div']['id'] is not, then the former is used to sanitize any id attribute on a <div> element.
html : string
a string of HTML
Returns: string  the input but with potentially dangerous tokens removed.
code »

Global Properties

goog.labs.html.scrubber.ALL_SCOPES_ :
No description.
Code »
goog.labs.html.scrubber.ATTRS_ :
The body of a tag between the end of the name and the closing > if any.
Code »
goog.labs.html.scrubber.ATTRS_RE_ :
A global matcher that separates attributes out of the tag body cruft.
Code »
goog.labs.html.scrubber.ATTR_VALUE_ :
Matches the equals-sign and any attribute value following it, but does not capture any > that would close the tag.
Code »
goog.labs.html.scrubber.ATTR_VALUE_PRECEDER_ :
Matches content following a tag name or attribute value, and before the beginning of the next attribute value.
Code »
goog.labs.html.scrubber.BALANCE_NESTING_LIMIT_ :
We limit the nesting limit of balanced HTML to a large but manageable number so that built-in browser limits aren't likely to kick in and undo all our matching of start and end tags.
This mitigates the HTML parsing equivalent of stack smashing attacks.
Otherwise, crafted inputs like <p><p><p><p>...Ad nauseam...</p></p></p></p> could exploit browser bugs, and/or undocumented nesting limit recovery code to misnest tags.
Code »
goog.labs.html.scrubber.BLOCK_CONTAINERS_ :
Per-element, a child that can contain block content.
Code »
goog.labs.html.scrubber.BREAK_ :
Matches when the next character cannot continue a tag name.
Code »
goog.labs.html.scrubber.CC_BANG_ :
Character code constant for '!'. @private
Code »
goog.labs.html.scrubber.CC_LT_ :
Character code constant for '<'. @private
Code »
goog.labs.html.scrubber.CC_QMARK_ :
Character code constant for '?'. @private
Code »
goog.labs.html.scrubber.CC_SLASH_ :
Character code constant for '/'. @private
Code »
goog.labs.html.scrubber.COMMENT_ :
Matches HTML comments including HTML 5 "bogus comments" of the form <!...> or <?...> or </...>.
Code »
goog.labs.html.scrubber.DOUBLE_QUOTED_ATTR_VALUE_ :
No description.
Code »
goog.labs.html.scrubber.ELEMENT_CONTENTS_ :
The groups which the element can contain. Defaults to 0.
Code »
goog.labs.html.scrubber.ELEMENT_GROUPS_ :
The groups into which the element falls. The default is an inline element.
Code »
goog.labs.html.scrubber.ELEMENT_SCOPES_ :
The scopes in which an element falls. No property defaults to 0.
Code »
goog.labs.html.scrubber.HTML_TOKENS_RE_ :
Regexp pattern for an HTML token after a doctype. Special elements introduces a capturing group for use with a back-reference.
Code »
goog.labs.html.scrubber.SINGLE_QUOTED_ATTR_VALUE_ :
No description.
Code »
goog.labs.html.scrubber.SPECIAL_ELEMENT_ :
Matches the open tag and body of a special element : one whose body cannot contain nested elements so uses special parsing rules. It does not include the end tag.
Code »
goog.labs.html.scrubber.TAG_ :
Regexp pattern for an HTML tag.
Code »
goog.labs.html.scrubber.TAG_NAME_CHAR_ :
A character that continues a tag name as defined at http://www.w3.org/html/wg/drafts/html/master/syntax.html#tag-name-state
Code »
goog.labs.html.scrubber.TAG_RE_ :
An HTML tag which captures the name in group 1, and any attributes in group 2.
Code »
goog.labs.html.scrubber.TEXT_NODE_ :
Regexp pattern for an HTML text node.
Code »
goog.labs.html.scrubber.UNQUOTED_ATTR_VALUE_ :
No description.
Code »

Package labs.html

Package Reference