lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: HTML text extraction
Date Wed, 21 Jun 2006 06:36:34 GMT

I also wrote about using NekoHTML, I think.  I prefer that to JTidy.  That also tells you
what uses.


----- Original Message ----
From: John Wang <>
Sent: Wednesday, June 21, 2006 1:39:41 AM
Subject: HTML text extraction

Can someone please suggest a HTML text extraction library? In the Lucene
book, it recommends Tidy. Seems jtidy is not really being maintained.

Otis, what do you guys use at Simpy?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message