Return-Path: Delivered-To: apmail-jakarta-commons-user-archive@www.apache.org Received: (qmail 34678 invoked from network); 24 May 2004 20:14:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 24 May 2004 20:14:56 -0000 Received: (qmail 96061 invoked by uid 500); 24 May 2004 19:51:04 -0000 Delivered-To: apmail-jakarta-commons-user-archive@jakarta.apache.org Received: (qmail 93127 invoked by uid 500); 24 May 2004 19:50:33 -0000 Mailing-List: contact commons-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Users List" Reply-To: "Jakarta Commons Users List" Delivered-To: mailing list commons-user@jakarta.apache.org Received: (qmail 90494 invoked by uid 98); 24 May 2004 19:50:04 -0000 Received: from robertburrelldonkin@blueyonder.co.uk by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(195.188.213.9):. Processed in 0.096634 secs); 24 May 2004 19:50:04 -0000 X-Qmail-Scanner-Mail-From: robertburrelldonkin@blueyonder.co.uk via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(195.188.213.9):. Processed in 0.096634 secs) Received: from unknown (HELO smtp-out6.blueyonder.co.uk) (195.188.213.9) by hermes.apache.org with SMTP; 24 May 2004 19:50:04 -0000 Received: from [10.0.0.2] ([82.38.65.173]) by smtp-out6.blueyonder.co.uk with Microsoft SMTPSVC(5.0.2195.5600); Mon, 24 May 2004 20:49:40 +0100 Mime-Version: 1.0 (Apple Message framework v613) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <7B3EA77E-ADBB-11D8-BF18-003065DC754C@blueyonder.co.uk> Content-Transfer-Encoding: 7bit From: robert burrell donkin Subject: Re: [digester] reading embedded HTML (or other mixed text) Date: Mon, 24 May 2004 20:49:34 +0100 To: "Jakarta Commons Users List" X-Mailer: Apple Mail (2.613) X-OriginalArrivalTime: 24 May 2004 19:49:40.0108 (UTC) FILETIME=[404A1CC0:01C441C8] X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N another alternative might be to use nekoHTML (http://www.apache.org/~andyc/) to bootstrap a SAC pipeline. - robert On 23 May 2004, at 23:44, Adrian Sutton wrote: > Sounds like you may want to run the HTML section through JTidy > (http://jtidy.sourceforge.net) to convert it to XHTML first. Then > Digester should be able to at least parse it. > > Regards, > > Adrian Sutton. > > -----Original Message----- > From: Simon Kitching [mailto:simon@ecnetwork.co.nz] > Sent: Monday, 24 May 2004 8:39 AM > To: Jakarta Commons Users List > Subject: Re: [digester] reading embedded HTML (or other mixed text) > > On Fri, 2004-05-21 at 12:34, Bill Keese wrote: >> Is there any way to tell digester to read in the entire content of an >> element (including text and sub-elements) as a single String? For >> example, if I persist e-mail to XML, I'd like to use digester to read >> the e-mail address list, etc., but the HTML content of the mail should >> be read verbatim. >> > > Hi Bill, > > HTML is not valid XML. Digester uses a standard XML parser to parse the > input, so it is not possible to process an input document which is not > valid XML. > > As Jose has said in a separate reply, you could wrap your HTML in CDATA > tags in the input document. The xml parser will then see the contents > of > that cdata section as just a text string - and so will Digester. > > Alternatively, you could use XHTML, which most browsers support. In > this > case, you could then use NodeCreateRule. > > Regards, > > Simon > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: commons-user-help@jakarta.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: commons-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-user-help@jakarta.apache.org