xerces-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alberto Massari (JIRA)" <xerces-c-...@xml.apache.org>
Subject [jira] [Commented] (XERCESC-1967) Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the charset parameter of the HTTP content-type: header
Date Thu, 09 Jun 2011 12:44:58 GMT

    [ https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046508#comment-13046508

Alberto Massari commented on XERCESC-1967:

I don't agree on your request of reversing the priorities, but that's a discussion that shouldn't
be done here. Good luck in trying to convince W3C.
The XML spec says that the BOM+internal encoding have the precedence when the XML is in a
*file*, because it is likely that no transcoding has been performed on top of it. For all
the other scenarios (when the XML is in a byte stream) the component that does the wrapping
should take care of telling the parser the new setting. This is what Xerces is doing now,
and in my opinion it's correct and shouldn't be changed.
What is missing in Xerces is the capability of propagating the content-type read from the
HTTP stream to the parser; whether the content type is text/xml vs application/xml, this is
simply affecting what is the default encoding when the content-type is not specified. And
in case 8.20 there is an encoding specified, so it doesn't matter which one (text/xml or application/xml)
was specified.

In short, if you think that pparse (or saxcount) should refuse to parse your web page (that
has an HTTP content type specifying Korean, plus an UTF-8 BOM), I agree and I will try to
fix it. 

> Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the charset
parameter of the HTTP content-type: header
> --------------------------------------------------------------------------------------------------------------------------------
>                 Key: XERCESC-1967
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1967
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 3.1.1
>         Environment: Mac OS X Snow Leopard (Intel).  (http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz)
> And also tested the XMLmind XML editor on same platorm.
>            Reporter: Leif Halvard Silli
>   Original Estimate: 4h
>  Remaining Estimate: 4h
> [1] http://www.w3.org/mid/20110609033243875895.0f711adc@xn--mlform-iua.no
> [2] http://www.w3.org/mid/20110609090401531862.04ce13e8@xn--mlform-iua.no
> It is a XML 1.0 spec vioation. well-formed violation.
> Test cases without XML declaration: http://malform.no/testing/html5/bom/
> Test cases *with* XML declartion to be added later.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org

View raw message