lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhavin Pandya" <bhav...@rediff.co.in>
Subject Re: Indexing HTML pages and phrases
Date Thu, 15 Mar 2007 06:37:02 GMT
Hi Maryam,

You can index the content of specific field as UN_TOKENIZED and then you can 
do phrase search on that field..
It will search for only phrases not tokens...
To index HTML pages you can use any HTML parser...
this may be useful to you..
http://lucene.apache.org/java/docs/api/org/apache/lucene/demo/html/HTMLParser.html

Thanks.
Bhavin pandya


----- Original Message ----- 
From: "Maryam" <mkar160@yahoo.com>
To: <java-user@lucene.apache.org>
Sent: Thursday, March 15, 2007 7:55 AM
Subject: Indexing HTML pages and phrases


> Hi,
>
> I am wondering if we can index a phrase (not term) in
> Lucene? Also, I am not usre if it can index HTML
> pages? I need to have access to the text of some of
> tags, I am not sure if this can be done in Lucene. I
> would be so glad if you help me in this case.
>
> Thanks
>
>
>
>
> ____________________________________________________________________________________
> Expecting? Get great news right away with email Auto-Check.
> Try the Yahoo! Mail Beta.
> http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message