lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liao Xuefeng" <askxuef...@gmail.com>
Subject RE: HTML text extraction
Date Wed, 21 Jun 2006 12:06:39 GMT
hi,
i wrote my own html parser to do html2text and it works well. i can send you
my code if it matches your require.

-----Original Message-----
From: John Wang [mailto:john.wang@gmail.com] 
Sent: Wednesday, June 21, 2006 1:40 PM
To: java-user@lucene.apache.org
Subject: HTML text extraction

Can someone please suggest a HTML text extraction library? In the Lucene
book, it recommends Tidy. Seems jtidy is not really being maintained.

Otis, what do you guys use at Simpy?

Thanks

-john


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message