lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: HTMLParser vs NekoHTML(indexig HTML files)
Date Mon, 27 Dec 2004 18:30:20 GMT
I can't comment on the comparison, but can report that I use
NekoHTMLParser and like it.  It's convenient as it is an extension of
Xerces that uses the same standard API's.  It automatically closes and
balances tags so the resulting tree is well-structured like XML.  So far
it has parsed everything I've pointed it at, except for one document
(haven't figured out what's unique about that document yet).  It is
hosted on the Apache site and although not officially part of Apache it
may become a subproject of Xerces, which would bode well for its
standing.

Chuck

  > -----Original Message-----
  > From: Daniel Cortes [mailto:dcortes@fib.upc.edu]
  > Sent: Monday, December 27, 2004 8:16 AM
  > To: lucene-user@jakarta.apache.org
  > Subject: HTMLParser vs NekoHTML(indexig HTML files)
  > 
  > What do you prefer?and more important, why?
  > Someone tell me that Neko is more powerfull because something
  > relationated  about XML, but I didn't understand.
  > 
  > 
  > 
  > 
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
  > For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message