lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "d rj" <drjat...@gmail.com>
Subject Re: how to write this regexps?
Date Tue, 29 Aug 2006 13:10:03 GMT
I would recommend using the open source project HTMLParser (
http://htmlparser.sourceforge.net/).  It provides an excellent API for
parsing html files and extracting the relevant text.
-drj

On 8/29/06, James liu <liuping.james@gmail.com> wrote:
>
> i wanna index html,,,but it have image,flash,javascript, and i wanna make
> index quick,,
>
> but i don't know how to get textmode content,,,
>
> anyone can help me?
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message