xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <j...@socialchange.net.au>
Subject Re: JAXP problems: Latin-1 encoded files, and DTDs in jar files
Date Mon, 08 Apr 2002 03:20:51 GMT
On Mon, Apr 08, 2002 at 04:10:34AM +0200, eriksoe@daimi.au.dk wrote:
> Hello all,
> I've run into problems using the SAX parser in JAXP.
> 
> The first is that it seems impossible to get it to parse files encoded in
> something else than UTF-8 - in my case, I want it to read Latin1 (ISO-8859-1).
> 
> What I get is an
> org.xml.sax.SAXParseException: Character conversion error: "Malformed UTF-8 char
> -- is an XML encoding declaration missing?" (line number may be too low).
> 
> I have tried inserting encoding="ISO-8859-1" (or encoding="Latin-1") in the xml
> DOCTYPE tag and in the DTD <?xml?> tag. (Actually, even if I supply nonsense
> encoding names, I get the same error message - it seems to be ignored altogether).
> 
> I've also tried somthing like
>    InputSource is = new InputSource(new FileInputStream(filename));
>    is.setEncoding("ISO-8859-1");
>    xmlReader.parse(filename);
> but without success.
> 
> As my editor does not support UTF-8, I'd really like to be able to write Latin1
> douments. Is there a solution?

Try just passing the parser a URL, instead of an InputSource. It's
generally better to let the parser figure out the encoding itself.
Parsers will take into effect the byte order mark, encoding declaration,
MIME type, etc.

See
http://www.opensource.socialchange.net.au/doctypechanger/latest/apidocs/overview-summary.html#doc.CharVSByte

> The second problem is that our java program is going to be put into a jar file,
> along with its data: some XML files and their DTD.
> I know how to make the XmlParser read the XML files, but they cannot find the
> DTD (and I can't just specify an URL in the DOCTYPE tag of the XML files, as the
> URL for jar resources are platform dependent). How do you make to parser search
> for the DTD among the resources in the jar file?

Standard way is to write a custom EntityResolver which loads the DTD
from the jar, and returns the byte stream to the parser when asked.

Example at:

http://www.opensource.socialchange.net.au/doctypechanger/latest/apidocs/overview-summary.html#doc.Validating

--Jeff

> (Our program has no problems with file locations before being jarred, so I'd of
> course like to know a minimal-impact solution, if one exists.)
> 
>     Erik Søe Sørensen
> --
> Lights. Panic. Action.
> 

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message