lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shrinath.m" <shrinat...@webyog.com>
Subject Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?
Date Fri, 11 Mar 2011 11:11:54 GMT
Thank you Li Li.

Two questions :

1. Is there anything *in* *Lucene* that I need to know of ? some contrib
module or anything as such ?
2. You ran a search in java-source.net for me, thanks for that, but do you
mind telling me which is the easiest and fastest ??

On Fri, Mar 11, 2011 at 4:38 PM, Li Li [via Lucene] <
ml-node+2664327-2139887543-376162@n3.nabble.com> wrote:

> http://java-source.net/open-source/html-parsers
>
> 2011/3/11 shrinath.m <[hidden email]<http://user/SendEmail.jtp?type=node&node=2664327&i=0&by-user=t>>
>
>
> > I am trying to index content withing certain HTML tags, how do I index it
> ?
> > Which is the best parser/tokenizer available to do this ?
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2664316.html<http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2664316.html?by-user=t>
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=2664327&i=1&by-user=t>
> > For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=2664327&i=2&by-user=t>
> >
> >
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2664327.html
>  To unsubscribe from Which is the +best +fast HTML parser/tokenizer that I
> can use with Lucene for indexing HTML content today ?, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=2664316&code=c2hyaW5hdGgubUB3ZWJ5b2cuY29tfDI2NjQzMTZ8LTIxMzY3ODQ0ODI=>.
>
>



-- 
Regards
Shrinath.M


--
View this message in context: http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2664331.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message