commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig McClanahan" <craig...@apache.org>
Subject Re: digester parsing with html content
Date Thu, 28 Sep 2006 21:35:57 GMT
On 9/27/06, Torgeir Veimo <torgeir@pobox.com> wrote:
>
> I'm trying to use digester for parsing xml that were previously
> parsed with jaxb 1.0-ea. Some of the content is xhtml fragments
> inside xml, eg.
>
> <body-text><xhtml>...</xhtml><body-text>
>
> and I'd like to retrieve the content as a String bean property.
> However, I'd like the parser to threat the content of body-text as
> opaque. Now it tries to parse it and chokes on eg. &oslash; entities.
>
> Any clues on how I can configure digester, or more precisely, the
> underlying parser, to avoid these problems?


One general strategy would be to define all of the entities that HTML
defines by default, in the DOCTYPE of the surrounding XML document that you
are parsing.  That way, they would just get expanded (at the XML parsing
level) and not cause you any problem.

--
> Torgeir Veimo
> torgeir@pobox.com


Craig

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message