commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duncan Jones (JIRA)" <>
Subject [jira] [Commented] (LANG-1056) StringEscapeUtils.unescapeHtml4 java.lang.IllegalArgumentException
Date Fri, 24 Oct 2014 14:54:35 GMT


Duncan Jones commented on LANG-1056:

The Javadocs are not overly clear on this subject, but I would suggest this isn't a bug. The
docs say:

bq. If an entity is unrecognized, it is left alone, and inserted verbatim into the result
string. e.g. "&amp;gt;&amp;zzzz;x" will become ">&amp;zzzz;x".

The tricky word here is "unrecognized". I think {{&#39511154;}} is recognised as an escaped
Unicode character, but it fails during conversion. That's probably a different scenario to
not _recognising_ an invalid entity like {{&zzz;}}.

I would suggest the docs are vague enough to support action in either direction. We either
declare this is a bug and fix it or we decide it's good behaviour, but update the Javadocs
to make it clearer this will happen.

I welcome comments from others. I think the original intention here was for no exceptions
to be thrown, so I'd be in favour of calling this a bug.

> StringEscapeUtils.unescapeHtml4 java.lang.IllegalArgumentException
> ------------------------------------------------------------------
>                 Key: LANG-1056
>                 URL:
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.3.2
>         Environment: Ubuntu 64
>            Reporter: Jakub
> When I try to unescape 
> {code:java}
> String test = "test &#39511154;";
> StringEscapeUtils.unescapeHtml4(test);
> {code}
> I got :
> {noformat}
> java.lang.IllegalArgumentException
> 	at java.lang.Character.toChars(
> 	at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(
> 	at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(
> 	at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(
> 	at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(
> 	at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(
> 	at unescapeHtml4Test.Main.main(
> {noformat}
> It is bug or not? Method should return "test &#39511154" without exception or not?.

This message was sent by Atlassian JIRA

View raw message