commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Sussland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LANG-1042) StringEscapeUtils.escapeHtml() does not escape single quote
Date Wed, 08 Oct 2014 20:20:33 GMT

    [ https://issues.apache.org/jira/browse/LANG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164093#comment-14164093
] 

Robert Sussland commented on LANG-1042:
---------------------------------------

I was not expecting this data to be safe for arbitrary injection into html. String escaping
has a well-defined meaning -- the output of this function should not be able break out of
a string data context, because all characters that could be interpreted by the html parser
as closing out the string data context are escaped.

This is exactly the string escaping behavior of other methods in this package, and what is
commonly known as string escaping. 

c.f. https://tomcat.apache.org/taglibs/standard/apidocs/org/apache/taglibs/standard/util/EscapeXML.html

In terms of which characters need to be escaped, HTML as well as XML only allow string data
in two places: attribute values and text nodes. Control characters that denote start/end of
attribute values and text nodes are well-defined and finite: single/double quote for attribute
values and brackets <, >.  This assumes that the template as a whole is valid html.


Additionally, the escaping symbol should also be escaped so that the method is a bijection
and an unescaping method is possible. 

Finally, there is little value to a method that performs only html entity encoding -- unless
you are building an html entity encoding demonstration method. The list of html entities was
selected as a convenience so that html developers would not need to memorize ascii/unicode
values for commonly used symbols such as e-accent, the less than sign, or the euro sign. The
list of html entities is not the list of html control characters and is not relevant for an
html string escaping method.

> StringEscapeUtils.escapeHtml() does not escape single quote
> -----------------------------------------------------------
>
>                 Key: LANG-1042
>                 URL: https://issues.apache.org/jira/browse/LANG-1042
>             Project: Commons Lang
>          Issue Type: Bug
>            Reporter: Robert Sussland
>            Priority: Critical
>
> The String Escape Utils should ensure that encoded data cannot escape from a string.
However in HTML (starting with 1.0 and until the present), attribute values may be denoted
by either single or double quotes. Therefore single quotes need to be escaped just as much
as double quotes. 
> From the standard: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
> {quote}
> By default, SGML requires that all attribute values be delimited using either double
quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote
marks can be included within the attribute value when the value is delimited by double quote
marks, and vice versa. Authors may also use numeric character references to represent double
quotes (&amp;#34\;) and single quotes (&amp;#39\;). For double quotes authors can
also use the character entity reference &amp;quot;.
> {quote}
> Note that there have been several bugs in the wild in which string encoders use this
library under the hood, and as a result fail to properly escape html attributes in which user
input is stored:
> <div title='<%=user_data%>'>Howdy</div>
> if user_data = ' onclick='payload' ' 
> then an attacker can inject their code into the page even if the developer is using the
string escape utils to escape the user string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message