Return-Path: X-Original-To: apmail-xerces-c-dev-archive@www.apache.org Delivered-To: apmail-xerces-c-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69BC9642A for ; Thu, 9 Jun 2011 12:07:24 +0000 (UTC) Received: (qmail 90713 invoked by uid 500); 9 Jun 2011 12:07:24 -0000 Delivered-To: apmail-xerces-c-dev-archive@xerces.apache.org Received: (qmail 90683 invoked by uid 500); 9 Jun 2011 12:07:24 -0000 Mailing-List: contact c-dev-help@xerces.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: c-dev@xerces.apache.org Delivered-To: mailing list c-dev@xerces.apache.org Received: (qmail 90676 invoked by uid 99); 9 Jun 2011 12:07:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2011 12:07:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2011 12:07:22 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6AC0E10AC9C for ; Thu, 9 Jun 2011 12:07:02 +0000 (UTC) Date: Thu, 9 Jun 2011 12:07:02 +0000 (UTC) From: "Leif Halvard Silli (JIRA)" To: c-dev@xerces.apache.org Message-ID: <1166879.6620.1307621222434.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1593866828.5965.1307607358985.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (XERCESC-1967) Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the charset parameter of the HTTP content-type: header MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046493#comment-13046493 ] Leif Halvard Silli commented on XERCESC-1967: --------------------------------------------- Note that case 8.20 (http://tools.ietf.org/html/rfc3023#section-8.20) is 'text/xml'. The RFC does not discuss transcoding for application/xml (whcih e.g. application/xmlxhtml+xml' is a subtype of - so says Mark Piligrim at least: http://feedparser.org/docs/character-encoding.html ) For application/xml, the RFC only presents "because the HTTP RFC says so" justifiction. http://tools.ietf.org/html/rfc3023#section-3.2 And transcoding should not happen for application/xml, as much as I understand. Note, also, that all this started because of a bug against HTML5/XHTML5: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12897 In my view (outlined in the bug), one should consider give priority to the (UTF-8) BOM over both HTTP and the XML encoding declaration. This, in the "interests of interoperability", as XML 1.0 puts it (http://www.w3.org/TR/xml/#sec-guessing-with-ext-info) > Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the charset parameter of the HTTP content-type: header > -------------------------------------------------------------------------------------------------------------------------------- > > Key: XERCESC-1967 > URL: https://issues.apache.org/jira/browse/XERCESC-1967 > Project: Xerces-C++ > Issue Type: Bug > Components: Non-Validating Parser > Affects Versions: 3.1.1 > Environment: Mac OS X Snow Leopard (Intel). (http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz) > And also tested the XMLmind XML editor on same platorm. > Reporter: Leif Halvard Silli > Original Estimate: 4h > Remaining Estimate: 4h > > [1] http://www.w3.org/mid/20110609033243875895.0f711adc@xn--mlform-iua.no > [2] http://www.w3.org/mid/20110609090401531862.04ce13e8@xn--mlform-iua.no > It is a XML 1.0 spec vioation. well-formed violation. > Test cases without XML declaration: http://malform.no/testing/html5/bom/ > Test cases *with* XML declartion to be added later. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org For additional commands, e-mail: c-dev-help@xerces.apache.org