commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Sussland (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LANG-1042) StringEscapeUtils.escapeHtml() does not escape single quote
Date Tue, 14 Oct 2014 17:07:34 GMT

    [ https://issues.apache.org/jira/browse/LANG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171218#comment-14171218
] 

Robert Sussland edited comment on LANG-1042 at 10/14/14 5:07 PM:
-----------------------------------------------------------------

Is there any chance we could deprecate this method? Regardless of what the original author
was intending, they have now permeated the open source eco-system:

a github search shows 25K hits for StringEscapeUtils.escapeHtml. Looking at the results, users
are expecting html string escaping in both attribute values and html nodes, with over 14K
hits in JSPs where they are manually using this function to perform security encoding.

https://github.com/search?utf8=%E2%9C%93&q=StringEscapeUtils.escapeHtml&type=Code&ref=searchresults

What is saving the majority of these users is the (Java) convention that html attributes be
double rather than single quoted. However, in Rails, attributes are single quoted by convention,
and in JS-land you see a mix. I.e. we cannot rely on users being accidentally safe when using
this function.

This function is also used to encode attribute values in Apache Struts. Spring MVC is not
importing the package but effectively copied the code in HtmlUtils.java. 

Then Grails copied Spring's encoder (c.f. package org.grails.encoder.impl) which was also
picked up by Wicket (wicket-util/src/main/java/org/apache/wicket/util/string/Entities.java).


Point being, you have the vast majority of Java MVC frameworks now improperly escaping html
attributes and crucially relying on their double quote conventions for security. This is a
concern when they take these same libraries re-use them.

Here is the Play project, which for the most part uses the StringEscapeUtils methods, except
for HTML encoding.

{code}
/api/src/main/scala/play/twirl/api/Formats.scala

  def escape(text: String): Html = {
    // Using our own algorithm here because commons lang escaping wasn't designed for protecting
against XSS, and there
    // don't seem to be any other good generic escaping tools out there.
    val sb = new StringBuilder(text.length)
    text.foreach {
      case '<' => sb.append("&lt;")
      case '>' => sb.append("&gt;")
      case '"' => sb.append("&quot;")
      case '\'' => sb.append("&#x27;")
      case '&' => sb.append("&amp;")
      case c => sb += c
    }
    new Html(sb.toString)
  }
{code}

It would be nice if, instead of each developer needing to roll their own escaping library,
they could rely on a common string escaping library. The StringEscapeUtils does this except
for the html method -- what about deprecating this method and introducing a new one -- secureHtmlEscape
--  that escapes <, >, ', ", and &?

I understand that encoding is context sensitive. Generally you need to perform a sequence
of encodings to match the sequence of browser parsings, but the sequence is built from HTML,
JS, CSS, and URI encoding operations, so there is a need for a common definition of these.
Then frameworks and toolkits can perform the proper contextual encoding using these functions.



was (Author: rsussland):
Is there any chance we could deprecate this method? Regardless of what the original author
was intending, they have now permeated the open source eco-system:

a github search shows 25K hits for StringEscapeUtils.escapeHtml. Looking at the results, users
are expecting html string escaping in both attribute values and html nodes, with over 14K
hits in JSPs where they are manually using this function to perform security encoding.

https://github.com/search?utf8=%E2%9C%93&q=StringEscapeUtils.escapeHtml&type=Code&ref=searchresults

What is saving the majority of these users is the (Java) convention that html attributes be
double rather than single quoted. However, in Rails, attributes are single quoted by convention,
and in JS-land you see a mix. I.e. we cannot rely on users being accidentally safe when using
this function.

This function is also used to encode attribute values in Apache Struts. Spring MVC is not
importing the package but effectively copied the code in HtmlUtils.java. Then Grails copied
Spring's encoder (c.f. package org.grails.encoder.impl;) which was also picked up by Wicket
(wicket-util/src/main/java/org/apache/wicket/util/string/Entities.java). 

Point being, you have the vast majority of Java MVC frameworks now improperly escaping html
attributes and crucially relying on their double quote conventions for security. This is a
concern when they take these same libraries and use them for encoding data sent to the client,
or perform manual encoding in the controller with the unsafe escapeHTML function.  All because
of this escapeHtml function.

Here is the Play project, which for the most part uses the StringEscapeUtils methods.

{code}
/api/src/main/scala/play/twirl/api/Formats.scala

  def escape(text: String): Html = {
    // Using our own algorithm here because commons lang escaping wasn't designed for protecting
against XSS, and there
    // don't seem to be any other good generic escaping tools out there.
    val sb = new StringBuilder(text.length)
    text.foreach {
      case '<' => sb.append("&lt;")
      case '>' => sb.append("&gt;")
      case '"' => sb.append("&quot;")
      case '\'' => sb.append("&#x27;")
      case '&' => sb.append("&amp;")
      case c => sb += c
    }
    new Html(sb.toString)
  }
{code}

It would be nice if, instead of each developer needing to roll their own escaping library,
they could rely on a common string escaping library. The StringEscapeUtils does this except
for the html method -- what about deprecating this method and introducing a new one -- secureHtmlEscape
--  that escapes <, >, ', ", and &?


> StringEscapeUtils.escapeHtml() does not escape single quote
> -----------------------------------------------------------
>
>                 Key: LANG-1042
>                 URL: https://issues.apache.org/jira/browse/LANG-1042
>             Project: Commons Lang
>          Issue Type: Bug
>            Reporter: Robert Sussland
>            Priority: Critical
>
> The String Escape Utils should ensure that encoded data cannot escape from a string.
However in HTML (starting with 1.0 and until the present), attribute values may be denoted
by either single or double quotes. Therefore single quotes need to be escaped just as much
as double quotes. 
> From the standard: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
> {quote}
> By default, SGML requires that all attribute values be delimited using either double
quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote
marks can be included within the attribute value when the value is delimited by double quote
marks, and vice versa. Authors may also use numeric character references to represent double
quotes (&amp;#34\;) and single quotes (&amp;#39\;). For double quotes authors can
also use the character entity reference &amp;quot;.
> {quote}
> Note that there have been several bugs in the wild in which string encoders use this
library under the hood, and as a result fail to properly escape html attributes in which user
input is stored:
> <div title='<%=user_data%>'>Howdy</div>
> if user_data = ' onclick='payload' ' 
> then an attacker can inject their code into the page even if the developer is using the
string escape utils to escape the user string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message