cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Ball <>
Subject Re: problems parsing xml with dtd from a foreign source
Date Tue, 19 Mar 2002 06:14:12 GMT
On Sun, 10 Mar 2002, David Crossley wrote:

> > hey guys. i'm trying to retrieve some xml content over http to begin one
> > of my pipelines:
> >
> > /nlm/query?author=Smith
> >
> > <map:match pattern="nlm/query">
> >   <map:match type="request" pattern="author">
> >     <map:generate src=";mode=XML&amp;dispmax=999&amp;term={1}[au]"/>
> >     <map:serialize type="xml"/>
> >   </map:match>
> > </map:match>
> >
> > the xml returned from the nih server will begin like so:
> >
> > <?xml version="1.0"?>
> > <!DOCTYPE QueryResult PUBLIC "-//NLM//DTD QueryResult, 22 Jan 2002//EN"
> > "/entrez/query/DTD/pmqty_020122.dtd" >
> > <QueryResult>
> >
> > unfortunately, i get an exception when cocoon tries to parse this
> > document. it claims that it cannot access the dtd:
> >
> > no protocol:
> > /entrez/query/DTD/pmqty_020122.dtd


> Good to hear that the entity catlogs worked for you.
> I think that the reason that you cannot do without the
> entity catalog resolver, is that the document type declaration
> in the XML instance document is not using a full URL, i.e.
> So the parser is tying to find the DTD at the root of your
> local filesystem, i.e. /entrez/qu...

but it shouldn't do that. according to the xml spec on system ids:

"Unless otherwise provided by information outside the scope of this
specification (e.g. a special XML element type defined by a particular
DTD, or a processing instruction defined by a particular application
specification), relative URIs are relative to the location of the resource
within which the entity declaration occurs."

the location of the resource in this case is clearly its url:;mode=XML&amp;dispmax=999&amp;term={1}[au]

and that's the context in which the system identifier should be resolved,
right? (i could easily be wrong, i'm a little sketchy on the doctype
stuff. the spec seems clear enough on this point to me tho.)

if so, then while entity catalogs are a nice workaround, they don't work
unless you know in advance the dtd of the remote xml and also know that
it's not going to change. otherwise, your webapp can break without notice.
that's not cool! i'm sorry that i've not been able to come up with a patch
for this, i can't figure out which component is guilty. any clues?

- donald

To unsubscribe, e-mail:
For additional commands, email:

View raw message