commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rjn <ibg...@gmail.com>
Subject Re: Ignoring Specific Tags with Digester
Date Fri, 28 Jul 2006 19:31:01 GMT
Thanks for the responses.  Yeah, so the XML file is valid, it's just
that some of the tags have HTML embedded within them.  For Example:

<entry><p>This is text.</p></entry>

So Digestor seems this as:
entry/p

Rather than just entry.  I imagine I could just downloaded the XML
documents and knowing the structure, seach for the entry fields and
then cut out the text.  Then, store that separately.  I was just
hoping there was a way to list tags to ignore.  For example: <p>,
<br>, etc.

Thanks anyway,

On 7/27/06, rjn <ibgeek@gmail.com> wrote:
> Hi Everyone,
>
> I'm trying to write a Syndication Feed parser using Digester, however
> I'm running into a stumbling block.  Many feeds have HTML in the
> entries such as <a>, <br>, etc.   Digester tries to parse these as XML
> tags, thus leading to blanks in the data I pull out.  I was wondering
> if there was way to set Digester to ignore specific tags (in this
> case, the HTML tags)?
>
> Thanks,
> RJ
>
> --
> em: ibgeek@gmail.com
>


-- 
em: ibgeek@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message