cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Crossley <cross...@indexgeo.com.au>
Subject Re: problems parsing xml with dtd from a foreign source
Date Sat, 09 Mar 2002 23:43:45 GMT
Donald Ball wrote:
> (sent to cocoon-users, no help there...)
> 
> hey guys. i'm trying to retrieve some xml content over http to begin one
> of my pipelines:
> 
> /nlm/query?author=Smith
> 
> <map:match pattern="nlm/query">
>   <map:match type="request" pattern="author">
>     <map:generate src="http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=PubMed&amp;mode=XML&amp;dispmax=999&amp;term={1}[au]"/>
>     <map:serialize type="xml"/>
>   </map:match>
> </map:match>
> 
> the xml returned from the nih server will begin like so:
> 
> <?xml version="1.0"?>
> <!DOCTYPE QueryResult PUBLIC "-//NLM//DTD QueryResult, 22 Jan 2002//EN"
> "/entrez/query/DTD/pmqty_020122.dtd" >
> <QueryResult>
> 
> unfortunately, i get an exception when cocoon tries to parse this
> document. it claims that it cannot access the dtd:
> 
> java.net.MalformedURLException: no protocol:
> /entrez/query/DTD/pmqty_020122.dtd
> 	at java.net.URL.(URL.java:473)
> 	at java.net.URL.(URL.java:376)
> 	at java.net.URL.(URL.java:330)
> 	at
> org.apache.xerces.impl.XMLEntityManager.startEntity(XMLEntityManager.java:731)
> 	at
> org.apache.xerces.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:691)
> 	at
> org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:258)
> 	at
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(XMLDocumentScannerImpl.java:811)
> 	at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:333)
> 	at
> org.apache.xerces.parsers.StandardParserConfiguration.parse(StandardParserConfiguration.java:525)
> 	at
> org.apache.xerces.parsers.StandardParserConfiguration.parse(StandardParserConfiguration.java:581)
> 	at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:147)
> 	at
> org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1157)
> 	at
> org.apache.avalon.excalibur.xml.JaxpParser.parse(JaxpParser.java:241)
> 	at
> org.apache.cocoon.components.source.AbstractStreamSource.toSAX(AbstractStreamSource.java:204)
> 	at
> org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:142)
> 
> shouldn't it be trying to download the DTD from this url:
> 
> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pmqty_020122.dtd
> 
> where it does, in fact, live?
> 
> i did manage to work around this problem using the excellent entity
> catalogs facility, and i suspect that's what we'll want to use in the long
> term, but i would like to track down why this isn't working as (i think)
> it ought to. thanks in advance.
> 
> - donald

Good to hear that the entity catlogs worked for you.
I think that the reason that you cannot do without the
entity catalog resolver, is that the document type declaration
in the XML instance document is not using a full URL, i.e.
http://www.ncbi.nlm.nih.gov/entrez/qu...
So the parser is tying to find the DTD at the root of your
local filesystem, i.e. /entrez/qu...
--David


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message