abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Abley" <james.ab...@gmail.com>
Subject Re: DOCTYPE declaration causing WstxUnexpectedCharException
Date Thu, 13 Nov 2008 10:44:36 GMT
2008/11/13 Bruce Snyder <bruce.snyder@gmail.com>:
> I'm using the Abdera API to grab Atom feeds. I've tried a few
> different Atom feeds and I'm getting the following exception with all
> of them:
>
> ---------------------------------------------------------------------------------------------------
> Exception in thread "main" org.apache.abdera.parser.ParseException:
> org.apache.abdera.parser.ParseException:
> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '-'
> (code 45) in external DTD subset; expected closing '>' after ENTITY
> declaration
>  at [row,col,system-id]: [81,5,"http://www.w3.org/TR/html4/strict.dtd"]
>  from [row,col {unknown-source}]: [1,1]
>        at org.apache.abdera.protocol.client.AbstractClientResponse.getDocument(AbstractClientResponse.java:132)
>        at org.apache.abdera.protocol.client.AbstractClientResponse.getDocument(AbstractClientResponse.java:96)
>        at org.apache.abdera.protocol.client.AbstractClientResponse.getDocument(AbstractClientResponse.java:74)
>        at com.sonatype.feedeater.FeedEater.grabUris(FeedEater.java:52)
>        at com.sonatype.feedeater.FeedEater.run(FeedEater.java:41)
>        at com.sonatype.feedeater.FeedEater.main(FeedEater.java:34)
> Caused by: org.apache.abdera.parser.ParseException:
> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '-'
> (code 45) in external DTD subset; expected closing '>' after ENTITY
> declaration
>  at [row,col,system-id]: [81,5,"http://www.w3.org/TR/html4/strict.dtd"]
>  from [row,col {unknown-source}]: [1,1]
>        at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:260)
>        at org.apache.abdera.parser.stax.FOMBuilder.getFomDocument(FOMBuilder.java:333)
>        at org.apache.abdera.parser.stax.FOMParser.getDocument(FOMParser.java:72)
>        at org.apache.abdera.parser.stax.FOMParser.parse(FOMParser.java:207)
>        at org.apache.abdera.parser.stax.FOMParser.parse(FOMParser.java:145)
>        at org.apache.abdera.protocol.client.AbstractClientResponse.getDocument(AbstractClientResponse.java:119)
>        ... 5 more
> Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
> character '-' (code 45) in external DTD subset; expected closing '>'
> after ENTITY declaration
>  at [row,col,system-id]: [81,5,"http://www.w3.org/TR/html4/strict.dtd"]
>  from [row,col {unknown-source}]: [1,1]
>        at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:623)
>        at com.ctc.wstx.dtd.FullDTDReader.throwDTDUnexpectedChar(FullDTDReader.java:2013)
>        at com.ctc.wstx.dtd.FullDTDReader.parseEntityValue(FullDTDReader.java:1533)
>        at com.ctc.wstx.dtd.FullDTDReader.handleEntityDecl(FullDTDReader.java:2419)
>        at com.ctc.wstx.dtd.FullDTDReader.handleDeclaration(FullDTDReader.java:2075)
>        at com.ctc.wstx.dtd.FullDTDReader.parseDirective(FullDTDReader.java:720)
>        at com.ctc.wstx.dtd.FullDTDReader.parseDTD(FullDTDReader.java:599)
>        at com.ctc.wstx.dtd.FullDTDReader.readExternalSubset(FullDTDReader.java:457)
>        at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:478)
>        at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
>        at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3349)
>        at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
>        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>        at org.apache.abdera.parser.stax.FOMBuilder.getNextElementToParse(FOMBuilder.java:163)
>        at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:187)
>        ... 10 more
> ---------------------------------------------------------------------------------------------------
>
> The errors seem to occur from the call to the
> ClientResponse.getDocument(). As far as I can tell, the Abdera API is
> having trouble with the DOCTYPE declaration and is trying to fetch the
> strict.dtd. Is there a way to work around the DOCTYPE declaration?
>
> Bruce
> --
> perl -e 'print unpack("u30","D0G)U8V4\@4VYY9&5R\"F)R=6-E+G-N>61E<D\!G;6%I;\"YC;VT*"
> );'
>
> Apache ActiveMQ - http://activemq.org/
> Apache Camel - http://activemq.org/camel/
> Apache ServiceMix - http://servicemix.org/
>
> Blog: http://bruceblog.org/
>

Hi Bruce,

You're pulling down Atom feeds that have an html DOCTYPE? Are you sure
that they're valid Atom feeds? What does the feedvalidator [1] say?

Cheers,

James

[1] http://www.feedvalidator.org/

Mime
View raw message