commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilguiz Latypov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TEXT-42) [XSS] Possible attacks through StringEscapeUtils.escapeEcmaScript?
Date Sun, 29 Oct 2017 15:45:00 GMT

    [ https://issues.apache.org/jira/browse/TEXT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223929#comment-16223929
] 

Ilguiz Latypov edited comment on TEXT-42 at 10/29/17 3:44 PM:
--------------------------------------------------------------

I wonder if the escapeEcmaScript()'s use cases can be scrutinized.

* Outputting a standalone javascript file containing string literals.  The generation of string
literals to be surrounded by double or single quotes seems to be covered by the existing code
in escapeEcmaScript().
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string literals.  This needs
a new method *escapeHtmlAttr*.  Depending on the surrounding quotes or absence of them, all
characters of the attribute value will go through either a minimal substitution of [single/double
quotes and ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
with the HTML entity or through a broader replacement of [whitespace, ampersand, single/double
quotes, equals, greater/less-than and backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
Safety calls to use the broader escaping by default (and allow the narrow one as an option).
I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + escapeEcmaScript(input) +
dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code *lacks protection*
against the script's end tag taking precedence over any contents.  Because browsers allow
readable javascript between the script tags, browsers [stopped applying a straight decoding
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
similar to one in HTML attributes.  The code of escapeEcmaScript omitting the ampersand character
from escaping agrees with the HTML parsers.  According to the WHATWG HTML parsing rules, the
end script tag </script> will disrupt javascript parsing in any state.  Changing escapeEcmaScript()
to *escape the less-than character* (with either the backslash-x notation or with a simple
backslash prefix) will prevent from *XSS attacks injecting the end script tag* </script>.
 Escaping the greater-than character does not seem necessary but would look symmetrical to
escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + ")</script>");
{code}



was (Author: ilatypov):
I wonder if the escapeEcmaScript()'s use cases can be scrutinized.

* Outputting a standalone javascript file containing string literals.  The generation of string
literals to be surrounded by double or single quotes seems to be covered by the existing code
in escapeEcmaScript().
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string literals.  This needs
a new method *escapeHtmlAttr*.  Depending on the surrounding quotes or absence of them, all
characters of the attribute value will go through either a minimal substitution of [single/double
quotes and ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
with the HTML entity or through a broader replacement of [whitespace, ampersand, single/double
quotes, equals, greater/less-than and backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
Safety calls to use the broader escaping by default (and allow the narrow one as an option).
I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq + escapeEcmaScript(input) +
dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code *lacks protection*
against the script's end tag taking precedence over any contents.  Because browsers allow
readable javascript between the script tags, browsers [stopped applying a straight decoding
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
similar to one in HTML attributes.  The code in escapeEcmaScript() *must escape the less-than
character* (with either the backslash-x notation or with a simple backslash prefix).  Assuming
that browsers may keep applying their HTML entity decoding throughout the script tag contents,
encoding ampersands with the backslash-x notation or single backslash seems necessary.  Escaping
the greater-than character does not seem necessary but would look symmetrical to escaping
the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq + ")</script>");
{code}


> [XSS] Possible attacks through StringEscapeUtils.escapeEcmaScript?
> ------------------------------------------------------------------
>
>                 Key: TEXT-42
>                 URL: https://issues.apache.org/jira/browse/TEXT-42
>             Project: Commons Text
>          Issue Type: Bug
>            Reporter: Andy Reek
>              Labels: XSS
>             Fix For: 1.x
>
>
> org.apache.commons.lang3.StringEscapeUtils.escapeEcmaScript does the escape via a prefixed
'\' on all characters which must be escaped. I am not sure if this is really secure, if am
looking at the comments on https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values.
They say it is possible to do an attack by escape the escape. I tested this with the string
'\"' and the output was '\\\"'. Is this really ecma-/java-script secure? Or is it better to
use the implementation used by OWASP?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message