xalan-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven J. Hathaway" <shath...@e-z.net>
Subject Re: [jira] [Created] (XALANC-743) XalanOutputStream::transcode falls into infinite loop on 4 bytes unicode till out of memory
Date Mon, 29 Apr 2013 04:39:07 GMT
Unicode - FYI
Values between 0xD800 and 0xDBFF are not valid Unicode codepoints.
These values indicate in UTF-16 that a pair of 16-bit values is used to 
resolve
an actual Unicode codepoint.

When parsers attempt to resolve bad Unicode, the result is unreliable and
probably should fail.  The Xerces-C library should fail gracefully before
exhausting all of virtual process heap memory.

You should send the issue to the ASF Xerces-C project.

Sincerely,
Steven J. Hathaway

On 4/19/2013 2:43 PM, Jiangbei Fan (JIRA) wrote:
> Jiangbei Fan created XALANC-743:
> -----------------------------------
>
>               Summary: XalanOutputStream::transcode falls into infinite loop on 4 bytes
unicode till out of memory
>                   Key: XALANC-743
>                   URL: https://issues.apache.org/jira/browse/XALANC-743
>               Project: XalanC
>            Issue Type: Bug
>            Components: XalanC
>      Affects Versions: 1.10
>           Environment: Linux
>              Reporter: Jiangbei Fan
>              Assignee: Steven J. Hathaway
>
>
> In some rare cases, XalanTransformer::transform would stuck or crash when the input/stylesheet
contains 4-byte unicode. And I traced down the root cause in XalanOutputStream::transcode
>
> When the transcode buffer contains unicode of size 4 bytes, and the last XalanDOMChar
in the buffer is the first 2 bytes of a 4-byte unicode char. The XalanOutputStream::transcode
will fall into an infinite loop till it is out of memory. As XMLUTF8Transcoder.cpp in xerces
will not consume the last 2-bytes if it is part of 4 byte unicode. And transcode always loop
until all chars in the buffer is eaten. Specifically this will happen when the last XalanDOMChar
 in the input buffer is between 0xD800 and 0xDBFF.
>
> I cannot find whether this issue has been reported before. This is version 1.10.  I do
have a fix to add a bool reference to the function, so that the caller can push the last 2
byte back to the buffer if not consumed. But want to check it out before submit any fixes.
>
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
> For additional commands, e-mail: dev-help@xalan.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
For additional commands, e-mail: dev-help@xalan.apache.org


Mime
View raw message