commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <skitch...@apache.org>
Subject Re: Exception when parsing RSS
Date Wed, 10 Sep 2008 06:46:36 GMT
Adriano Bonat schrieb:
> On Tue, Sep 9, 2008 at 5:39 PM, Simon Kitching <skitching@apache.org> wrote:
>   
>> Can you post the relevant part of the rss input text?
>>     
>
> For example:
> http://news.google.com/?output=rss&ned=en&num=50&q=test&ie=UTF-8
That isn't what I meant;  I'm quite sure that google.com is generating 
good xml. But what is being passed to Digester?
>   
>> >From this error message, it sure looks like the input is invalid xml.
>> And if that is the case, then there is no way to parse it with any xml
>> parser.
>>     
>
> The <description> content from the Google's RSS is escaped, so "<" is
> &lt;, ">" is &gt;... so I don't understand why I'm getting that error.
>   
By the way, how do you view the raw xml from that url?
>   
>> If it is intermittent, then maybe you are getting intermittent
>> truncation of the input data stream.
>>     
>
> Hmm.. it is implemented like this:
>
> InputStreamReader isr = new
> InputStreamReader(urlConnection.getInputStream(), "UTF-8");
> BufferedReader br = new BufferedReader(isr);
> 			
> Channel channel = (Channel) this.rssParser.parse(br);
> 		
> urlConnection.disconnect();
>
> ... so using a BufferedReader is this "intermittent" problem possible?
>   

It would seem so.

I would recommend reading the contents of the input stream into a String 
first, then passing that to digester. Then you can see what data is 
really being parsed.

By the way, digester does not parse the input itself. Digester is simply 
a "sax event handler". The parse methods are just simple convenience 
wrappers that create an instance of whatever xml parser is bundled with 
the jvm, configures the digester instance to listen to events from that 
parser then passes the input to the xml parser. So what you are seeing 
is an error being reported from the standard xml parser built into your 
jvm; it's really nothing to do with Digester.

Regards,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message