Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A4E6A17C7E for ; Tue, 14 Oct 2014 17:07:34 +0000 (UTC) Received: (qmail 36613 invoked by uid 500); 14 Oct 2014 17:07:34 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 36517 invoked by uid 500); 14 Oct 2014 17:07:34 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 36501 invoked by uid 99); 14 Oct 2014 17:07:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Oct 2014 17:07:34 +0000 Date: Tue, 14 Oct 2014 17:07:34 +0000 (UTC) From: "Robert Sussland (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (LANG-1042) StringEscapeUtils.escapeHtml() does not escape single quote MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LANG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171218#comment-14171218 ] Robert Sussland edited comment on LANG-1042 at 10/14/14 5:07 PM: ----------------------------------------------------------------- Is there any chance we could deprecate this method? Regardless of what the original author was intending, they have now permeated the open source eco-system: a github search shows 25K hits for StringEscapeUtils.escapeHtml. Looking at the results, users are expecting html string escaping in both attribute values and html nodes, with over 14K hits in JSPs where they are manually using this function to perform security encoding. https://github.com/search?utf8=%E2%9C%93&q=StringEscapeUtils.escapeHtml&type=Code&ref=searchresults What is saving the majority of these users is the (Java) convention that html attributes be double rather than single quoted. However, in Rails, attributes are single quoted by convention, and in JS-land you see a mix. I.e. we cannot rely on users being accidentally safe when using this function. This function is also used to encode attribute values in Apache Struts. Spring MVC is not importing the package but effectively copied the code in HtmlUtils.java. Then Grails copied Spring's encoder (c.f. package org.grails.encoder.impl) which was also picked up by Wicket (wicket-util/src/main/java/org/apache/wicket/util/string/Entities.java). Point being, you have the vast majority of Java MVC frameworks now improperly escaping html attributes and crucially relying on their double quote conventions for security. This is a concern when they take these same libraries re-use them. Here is the Play project, which for the most part uses the StringEscapeUtils methods, except for HTML encoding. {code} /api/src/main/scala/play/twirl/api/Formats.scala def escape(text: String): Html = { // Using our own algorithm here because commons lang escaping wasn't designed for protecting against XSS, and there // don't seem to be any other good generic escaping tools out there. val sb = new StringBuilder(text.length) text.foreach { case '<' => sb.append("<") case '>' => sb.append(">") case '"' => sb.append(""") case '\'' => sb.append("'") case '&' => sb.append("&") case c => sb += c } new Html(sb.toString) } {code} It would be nice if, instead of each developer needing to roll their own escaping library, they could rely on a common string escaping library. The StringEscapeUtils does this except for the html method -- what about deprecating this method and introducing a new one -- secureHtmlEscape -- that escapes <, >, ', ", and &? I understand that encoding is context sensitive. Generally you need to perform a sequence of encodings to match the sequence of browser parsings, but the sequence is built from HTML, JS, CSS, and URI encoding operations, so there is a need for a common definition of these. Then frameworks and toolkits can perform the proper contextual encoding using these functions. was (Author: rsussland): Is there any chance we could deprecate this method? Regardless of what the original author was intending, they have now permeated the open source eco-system: a github search shows 25K hits for StringEscapeUtils.escapeHtml. Looking at the results, users are expecting html string escaping in both attribute values and html nodes, with over 14K hits in JSPs where they are manually using this function to perform security encoding. https://github.com/search?utf8=%E2%9C%93&q=StringEscapeUtils.escapeHtml&type=Code&ref=searchresults What is saving the majority of these users is the (Java) convention that html attributes be double rather than single quoted. However, in Rails, attributes are single quoted by convention, and in JS-land you see a mix. I.e. we cannot rely on users being accidentally safe when using this function. This function is also used to encode attribute values in Apache Struts. Spring MVC is not importing the package but effectively copied the code in HtmlUtils.java. Then Grails copied Spring's encoder (c.f. package org.grails.encoder.impl;) which was also picked up by Wicket (wicket-util/src/main/java/org/apache/wicket/util/string/Entities.java). Point being, you have the vast majority of Java MVC frameworks now improperly escaping html attributes and crucially relying on their double quote conventions for security. This is a concern when they take these same libraries and use them for encoding data sent to the client, or perform manual encoding in the controller with the unsafe escapeHTML function. All because of this escapeHtml function. Here is the Play project, which for the most part uses the StringEscapeUtils methods. {code} /api/src/main/scala/play/twirl/api/Formats.scala def escape(text: String): Html = { // Using our own algorithm here because commons lang escaping wasn't designed for protecting against XSS, and there // don't seem to be any other good generic escaping tools out there. val sb = new StringBuilder(text.length) text.foreach { case '<' => sb.append("<") case '>' => sb.append(">") case '"' => sb.append(""") case '\'' => sb.append("'") case '&' => sb.append("&") case c => sb += c } new Html(sb.toString) } {code} It would be nice if, instead of each developer needing to roll their own escaping library, they could rely on a common string escaping library. The StringEscapeUtils does this except for the html method -- what about deprecating this method and introducing a new one -- secureHtmlEscape -- that escapes <, >, ', ", and &? > StringEscapeUtils.escapeHtml() does not escape single quote > ----------------------------------------------------------- > > Key: LANG-1042 > URL: https://issues.apache.org/jira/browse/LANG-1042 > Project: Commons Lang > Issue Type: Bug > Reporter: Robert Sussland > Priority: Critical > > The String Escape Utils should ensure that encoded data cannot escape from a string. However in HTML (starting with 1.0 and until the present), attribute values may be denoted by either single or double quotes. Therefore single quotes need to be escaped just as much as double quotes. > From the standard: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2 > {quote} > By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (&#34\;) and single quotes (&#39\;). For double quotes authors can also use the character entity reference &quot;. > {quote} > Note that there have been several bugs in the wild in which string encoders use this library under the hood, and as a result fail to properly escape html attributes in which user input is stored: >
Howdy
> if user_data = ' onclick='payload' ' > then an attacker can inject their code into the page even if the developer is using the string escape utils to escape the user string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)