commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Jakarta-commons Wiki] Update of "Digester/FAQ" by SimonKitching
Date Thu, 16 Nov 2006 01:47:09 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jakarta-commons Wiki" for change notification.

The following page has been changed by SimonKitching:

The comment on the change is:
Add info about handling embedded HTML

  Remember that Digester is just a layer on top of a standard XML parser, and standard XML
parsers have no option to just stop parsing
  input at a specific element - unless it knows that the contents of that element is a block
of characters (CDATA).
+ == How do I get some HTML (or other non-xml data) nested within a tag as a literal string?
+ If you have something like:
+ {{{
+   <article>
+     <title>An article about something</title>
+     <body>
+       Some html (not XHTML) data here
+       <br>
+       And some more text.
+     </body>
+    </article>
+ }}}
+ then this simply 'cannot' be processed by digester. Digester is a layer on top of a standard
XML parser, and as this is not valid
+ XML the underlying parser will not allow it.
+ Your best option is to wrap the non-xml content in a CDATA section (see the preceding FAQ
entry). If you absolutely cannot change
+ the input format (despite it not being valid XML at all) then you may be able to use something
like the cybernecko HTML-Parser
+ library (which converts HTML into XHTML) to first pre-process the data.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message