Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 35617 invoked from network); 27 Dec 2004 18:30:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 27 Dec 2004 18:30:40 -0000 Received: (qmail 41009 invoked by uid 500); 27 Dec 2004 18:30:28 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 40990 invoked by uid 500); 27 Dec 2004 18:30:28 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 40976 invoked by uid 99); 27 Dec 2004 18:30:28 -0000 X-ASF-Spam-Status: No, hits=1.0 required=10.0 tests=SPF_HELO_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from reh001-1.rex001.exchangebyregister.com (HELO reh001-1.REX001.ExchangeByRegister.com) (64.78.19.14) by apache.org (qpsmtpd/0.28) with ESMTP; Mon, 27 Dec 2004 10:30:24 -0800 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Subject: RE: HTMLParser vs NekoHTML(indexig HTML files) Date: Mon, 27 Dec 2004 10:30:20 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: HTMLParser vs NekoHTML(indexig HTML files) Thread-Index: AcTsL3diR+salinnQmS58y1yj9Y1SgAEMpag From: "Chuck Williams" To: "Lucene Users List" X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I can't comment on the comparison, but can report that I use NekoHTMLParser and like it. It's convenient as it is an extension of Xerces that uses the same standard API's. It automatically closes and balances tags so the resulting tree is well-structured like XML. So far it has parsed everything I've pointed it at, except for one document (haven't figured out what's unique about that document yet). It is hosted on the Apache site and although not officially part of Apache it may become a subproject of Xerces, which would bode well for its standing. Chuck > -----Original Message----- > From: Daniel Cortes [mailto:dcortes@fib.upc.edu] > Sent: Monday, December 27, 2004 8:16 AM > To: lucene-user@jakarta.apache.org > Subject: HTMLParser vs NekoHTML(indexig HTML files) >=20 > What do you prefer?and more important, why? > Someone tell me that Neko is more powerfull because something > relationated about XML, but I didn't understand. >=20 >=20 >=20 >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org