Return-Path: X-Original-To: apmail-xalan-dev-archive@www.apache.org Delivered-To: apmail-xalan-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 180E6105DB for ; Mon, 29 Apr 2013 16:38:18 +0000 (UTC) Received: (qmail 67635 invoked by uid 500); 29 Apr 2013 16:38:17 -0000 Delivered-To: apmail-xalan-dev-archive@xalan.apache.org Received: (qmail 67599 invoked by uid 500); 29 Apr 2013 16:38:17 -0000 Mailing-List: contact dev-help@xalan.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@xalan.apache.org Delivered-To: mailing list dev@xalan.apache.org Received: (qmail 67557 invoked by uid 99); 29 Apr 2013 16:38:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2013 16:38:17 +0000 Date: Mon, 29 Apr 2013 16:38:17 +0000 (UTC) From: "Steven J. Hathaway (JIRA)" To: dev@xalan.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (XALANC-743) XalanOutputStream::transcode falls into infinite loop on 4 bytes unicode till out of memory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/XALANC-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644627#comment-13644627 ] Steven J. Hathaway commented on XALANC-743: ------------------------------------------- You should check the encodings for your input file and stylesheet file. Do they begin with a {byte-order-mark} for UTF-16, or are they assuming default UTF-8 encoding in absence of an document identifier. A similar bug XERCESC-1987 has been reported and fixed in the Xerces-C subversion trunk. This artifact is related to a chinese character. The fix will not be backported to Xerces-C version 2.8. Steven J. Hathaway > XalanOutputStream::transcode falls into infinite loop on 4 bytes unicode till out of memory > ------------------------------------------------------------------------------------------- > > Key: XALANC-743 > URL: https://issues.apache.org/jira/browse/XALANC-743 > Project: XalanC > Issue Type: Bug > Components: XalanC > Affects Versions: 1.10 > Environment: Linux > Reporter: Jiangbei Fan > Assignee: Steven J. Hathaway > > In some rare cases, XalanTransformer::transform would stuck or crash when the input/stylesheet contains 4-byte unicode. And I traced down the root cause in XalanOutputStream::transcode > When the transcode buffer contains unicode of size 4 bytes, and the last XalanDOMChar in the buffer is the first 2 bytes of a 4-byte unicode char. The XalanOutputStream::transcode will fall into an infinite loop till it is out of memory. As XMLUTF8Transcoder.cpp in xerces will not consume the last 2-bytes if it is part of 4 byte unicode. And transcode always loop until all chars in the buffer is eaten. Specifically this will happen when the last XalanDOMChar in the input buffer is between 0xD800 and 0xDBFF. > I cannot find whether this issue has been reported before. This is version 1.10. I do have a fix to add a bool reference to the function, so that the caller can push the last 2 byte back to the buffer if not consumed. But want to check it out before submit any fixes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org For additional commands, e-mail: dev-help@xalan.apache.org