xerces-j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Glavassevich (JIRA)" <xerces-j-...@xml.apache.org>
Subject [jira] Resolved: (XERCESJ-1041) Xerces C++ defines an encoding-string that Xerces/Java refuses to parse
Date Thu, 10 Feb 2005 05:43:12 GMT
     [ http://issues.apache.org/jira/browse/XERCESJ-1041?page=history ]
Michael Glavassevich resolved XERCESJ-1041:

    Resolution: Won't Fix

The encoding names which Xerces-J recognizes is restricted to those registered with IANA [1].

Name: ISO_8859-1:1987                                    [RFC1345,KXS2]
MIBenum: 4
Source: ECMA registry
Alias: iso-ir-100
Alias: ISO_8859-1
Alias: ISO-8859-1 (preferred MIME name)
Alias: latin1
Alias: l1
Alias: IBM819
Alias: CP819
Alias: csISOLatin1

Above are the aliases registered for ISO-8859-1. Xerces-J recognizes all of them. Note that
ISO8859-1 is not in this list. I believe the XML spec recommends the usage of IANA names to
increase the portability of XML documents across parser implementations. Supporting unregistered
encoding names harms document portability. The problem you've run into demonstrates that.
There are many other parsers out there which won't have any idea what encoding "ISO8859-1"
is since it isn't registered so you still have an interoperability problem.

[1] http://www.iana.org/assignments/character-sets

> Xerces C++ defines an encoding-string that Xerces/Java refuses to parse
> -----------------------------------------------------------------------
>          Key: XERCESJ-1041
>          URL: http://issues.apache.org/jira/browse/XERCESJ-1041
>      Project: Xerces2-J
>         Type: Bug
>     Versions: 2.4.0
>  Environment: XercesC-2.3, XalanJ 2.4, Solaris 6
>     Reporter: Dominik Stadler

> We are using Xerces C++ to create XML-Messages that are later parsed by Xerces/Java.
> XercesC provides a define XMLUni::fgISO88591EncodingString for setting the encoding,
the XML-Message contains the string "ISO8859-1" as encoding.
> When we later use Xerces/Java to parse this file, we get the following error:
> [Fatal Error] :1:43: Invalid encoding name "ISO8859-1".
> It seems that Xerces/Java only knows the encoding "ISO-8859-1" (with a dash), but not
"ISO8859-1" (without dash).
> The XML-Specification states that "ISO-8859-1" (with a dash) SHOULD be used, look at
> So in my opinion either Xerces C++ should not provide that define any more, or Xerces/Java
should be enhanced to accept that encoding-string. Otherwise XercesC and XercesJ differ in
this part, where we until now thought they would be equal in their parsing-behavior.
> I already report a Bug at http://issues.apache.org/jira/browse/XERCESC-1336 that reports
this for XercesC.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
If you want more information on JIRA, or have a bug to report see:

To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

View raw message