lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "张瑾" <jimi...@jimijin.com>
Subject Re: HTML text extraction
Date Thu, 22 Jun 2006 06:51:38 GMT
Please send it to me,thanks very much!

2006/6/21, Liao Xuefeng <askxuefeng@gmail.com>:
>
> hi,
> i wrote my own html parser to do html2text and it works well. i can send
> you
> my code if it matches your require.
>
> -----Original Message-----
> From: John Wang [mailto:john.wang@gmail.com]
> Sent: Wednesday, June 21, 2006 1:40 PM
> To: java-user@lucene.apache.org
> Subject: HTML text extraction
>
> Can someone please suggest a HTML text extraction library? In the Lucene
> book, it recommends Tidy. Seems jtidy is not really being maintained.
>
> Otis, what do you guys use at Simpy?
>
> Thanks
>
> -john
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message