commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rjn <>
Subject Re: Ignoring Specific Tags with Digester
Date Fri, 28 Jul 2006 19:31:01 GMT
Thanks for the responses.  Yeah, so the XML file is valid, it's just
that some of the tags have HTML embedded within them.  For Example:

<entry><p>This is text.</p></entry>

So Digestor seems this as:

Rather than just entry.  I imagine I could just downloaded the XML
documents and knowing the structure, seach for the entry fields and
then cut out the text.  Then, store that separately.  I was just
hoping there was a way to list tags to ignore.  For example: <p>,
<br>, etc.

Thanks anyway,

On 7/27/06, rjn <> wrote:
> Hi Everyone,
> I'm trying to write a Syndication Feed parser using Digester, however
> I'm running into a stumbling block.  Many feeds have HTML in the
> entries such as <a>, <br>, etc.   Digester tries to parse these as XML
> tags, thus leading to blanks in the data I pull out.  I was wondering
> if there was way to set Digester to ignore specific tags (in this
> case, the HTML tags)?
> Thanks,
> RJ
> --
> em:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message