lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Wang" <john.w...@gmail.com>
Subject HTML text extraction
Date Wed, 21 Jun 2006 05:39:41 GMT
Can someone please suggest a HTML text extraction library? In the Lucene
book, it recommends Tidy. Seems jtidy is not really being maintained.

Otis, what do you guys use at Simpy?

Thanks

-john

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message