commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duncan Jones (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LANG-1056) StringEscapeUtils.unescapeHtml4 java.lang.IllegalArgumentException
Date Fri, 24 Oct 2014 14:55:34 GMT

    [ https://issues.apache.org/jira/browse/LANG-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182854#comment-14182854
] 

Duncan Jones edited comment on LANG-1056 at 10/24/14 2:54 PM:
--------------------------------------------------------------

The Javadocs are not overly clear on this subject:

bq. If an entity is unrecognized, it is left alone, and inserted verbatim into the result
string. e.g. "&amp;gt;&amp;zzzz;x" will become ">&amp;zzzz;x".

The tricky word here is "unrecognized". I think {{&#39511154;}} is recognised as an escaped
Unicode character, but it fails during conversion. That's probably a different scenario to
not _recognising_ an invalid entity like {{&zzz;}}.

I would suggest the docs are vague enough to support action in either direction. We either
declare this is a bug and fix it or we decide it's good behaviour, but update the Javadocs
to make it clearer this will happen.

I welcome comments from others. I think the original intention here was for no exceptions
to be thrown, so I'd be in favour of calling this a bug.


was (Author: dmjones500):
The Javadocs are not overly clear on this subject, but I would suggest this isn't a bug. The
docs say:

bq. If an entity is unrecognized, it is left alone, and inserted verbatim into the result
string. e.g. "&amp;gt;&amp;zzzz;x" will become ">&amp;zzzz;x".

The tricky word here is "unrecognized". I think {{&#39511154;}} is recognised as an escaped
Unicode character, but it fails during conversion. That's probably a different scenario to
not _recognising_ an invalid entity like {{&zzz;}}.

I would suggest the docs are vague enough to support action in either direction. We either
declare this is a bug and fix it or we decide it's good behaviour, but update the Javadocs
to make it clearer this will happen.

I welcome comments from others. I think the original intention here was for no exceptions
to be thrown, so I'd be in favour of calling this a bug.

> StringEscapeUtils.unescapeHtml4 java.lang.IllegalArgumentException
> ------------------------------------------------------------------
>
>                 Key: LANG-1056
>                 URL: https://issues.apache.org/jira/browse/LANG-1056
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.3.2
>         Environment: Ubuntu 64
>            Reporter: Jakub
>
> When I try to unescape 
> {code:java}
> String test = "test &#39511154;";
> StringEscapeUtils.unescapeHtml4(test);
> {code}
> I got :
> {noformat}
> java.lang.IllegalArgumentException
> 	at java.lang.Character.toChars(Character.java:4982)
> 	at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:128)
> 	at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:52)
> 	at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:85)
> 	at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59)
> 	at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:627)
> 	at unescapeHtml4Test.Main.main(Main.java:10)
> {noformat}
> It is bug or not? Method should return "test &#39511154" without exception or not?.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message