xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From erik...@daimi.au.dk
Subject Re: JAXP problems: Latin-1 encoded files, and DTDs in jar files
Date Mon, 08 Apr 2002 04:14:31 GMT
Quoting Jeff Turner <jeff@socialchange.net.au>:

> On Mon, Apr 08, 2002 at 04:10:34AM +0200, eriksoe@daimi.au.dk wrote:
> > Hello all,
> > I've run into problems using the SAX parser in JAXP.
> > 
> > The first is that it seems impossible to get it to parse files encoded
> in something else than UTF-8 - in my case, I want it to read Latin1
> (ISO-8859-1). [...]
> 
> Try just passing the parser a URL, instead of an InputSource. It's
> generally better to let the parser figure out the encoding itself.
> Parsers will take into effect the byte order mark, encoding
> declaration,
> MIME type, etc.
> 
> See
>
http://www.opensource.socialchange.net.au/doctypechanger/latest/apidocs/overview-summary.html#doc.CharVSByte
> 
> > The second problem is that our java program is going to be put into a
> jar file, along with its data: some XML files and their DTD.
> > I know how to make the XmlParser read the XML files, but they cannot
> find the DTD (and I can't just specify an URL in the DOCTYPE tag of the XML
> files, as the URL for jar resources are platform dependent). [...]
> 
> Standard way is to write a custom EntityResolver which loads the DTD
> from the jar, and returns the byte stream to the parser when asked.
> 
> Example at:
> 
>
http://www.opensource.socialchange.net.au/doctypechanger/latest/apidocs/overview-summary.html#doc.Validating

Thanks for both pointers.
I've now solved both problems -

The first one was caused by the fact that I had only tried putting the 'encding'
attribute in the <?xml?> tag in the DTD, and in the DOCTYPE tag in the XML file.
I hadn't tried given the XML file a <?xml?> header tag and putting it there (I
don't know much about the formals of XML). A lead to this was that when I tried
parsing it in a specific way (which of course I can't reconstruct now) the
parser said it didn't like my DOCTYPE header (because I had put an encoding
attribute in it). Anyway, having an <?xml?> header in XML documents (and before
the DOCTYPE header) seems to be a good thing - the one in the DTD does not
matter for the XML file (sorry if this is trivial to everyone but me[1]).

As for the second problem, about DTDs in Jar file resources, this works fine
(also when the program is not jarred - which is nice):
    URL url = ThisFile.class.getResource("/"+filename);
    if (url==null)
        throw new FileNotFoundException("No such resource: "+filename);
    xmlReader.parse(url.toExternalForm());

This is good because I get URLs without having to worry about platform
independence. That it is URLs means that I don't have to change the relative 
DTD locations in the XML files. There was a small-impact solution after all...

Only new problem is: Is it possible to get a listing of (certain) resource files
in one's jar file? (we used to use File.listFiles() for that purpose - that just
doesn't work inside a jar file.).

    Erik Søe Sørensen

[1] I have had much use of searching the mailing lists of the world when I had a
hard problem - but much of the time the problem is mentioned, but not the
answer. That's why I'm recapping the solution here... 
--
Lights. Panic. Action.

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message