commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miquel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LANG-1022) NumericEntityUnescaper.translate throws an IllegalArgumentException if entityValue > MAX_CODE_POINT
Date Thu, 03 Jul 2014 08:24:24 GMT

    [ https://issues.apache.org/jira/browse/LANG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051180#comment-14051180
] 

Miquel commented on LANG-1022:
------------------------------

I expect it follows the actual behavior: (Using the function NumericEntityUnescaper.translate(CharSequence
input, Writer out)) 
- If the argument is a negative int like ""&#-14844069;" it writes "&#14844069;"
- If the argument is a positive int or 0 like "&#55;" it writes the char representation
(7) (smaller than MAX_CODE_POINT)
- If the argument doesn't start with "&#" it writes the same input to the writer.
- If the argument is not an hex representation like "&#aaaa;" but with valid characters
for an HEX, it captures a NumberFormatException and writes the input "&#aaaa;"
- If the argument is a string it writes the input to the output.

- It throws an IllegalArgumentException if the NumericEntityUnescaper is configured with the
option errorIfNoSemiColon and the input doesn't ends with it.

Why do you think that the expected behavior is throw an IllegalArgumentException if the value
is a positive integer bigger than MAX_CODE_POINT?

I expect it writes the input to the output without throw an exception.


> NumericEntityUnescaper.translate throws an IllegalArgumentException if entityValue >
MAX_CODE_POINT
> ---------------------------------------------------------------------------------------------------
>
>                 Key: LANG-1022
>                 URL: https://issues.apache.org/jira/browse/LANG-1022
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>            Reporter: Miquel
>            Priority: Minor
>
> We found that using the function StringEscapeUtils.unescapeHtml4 crashes if the argument
is "&#14844069;" and throws an IllegalArgumentException.
> This happens because internally it calls the function NumericEntityUnescaper.translate
and doesn't check if the value is bigger than 0x10FFFF (MAX_CODE_POINT) that is a check inside
Character.toChar.
> Maybe we need to check that the entity value is less than Char.MAX_CODE_POINT.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message