lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Analyzer
Date Mon, 29 Nov 2010 14:02:53 GMT
Hi Manjula,

It's not terribly clear what you're doing here - I got lost in your description of your (two?
or maybe four?) classes.  Sometimes things are easier to understand if you provide more concrete
detail.

I suspect that you could benefit from reading the book Lucene in Action, 2nd edition: 

   http://www.manning.com/hatcher3/

You would also likely benefit from using Luke, the Lucene index browser, to better understand
your indexes' contents and debug how queries match documents: 

   http://code.google.com/p/luke/

I think your question is whether you're using Analyzers correctly.  It sounds like you are
creating two separate indexes (one for each of your classes), and you're using SnowballAnalyzer
on the indexing side for both indexes, and StandardAnalyzer on the query side.  

The usual advice is to use the same Analyzer on both the query and the index side.  But it
appears to be the case that you are taking stemmed index terms from your index #1 and then
querying index #2 using these stemmed terms.  If this is true, then you want the query-time
analyzer in your second index not to change the query terms.  You'll likely get better results
using WhitespaceAnalyzer, which tokenizes on whitespace and does no further analysis, rather
than StandardAnalyzer.

Steve

> -----Original Message-----
> From: manjula wijewickrema [mailto:manjula53@gmail.com]
> Sent: Monday, November 29, 2010 4:32 AM
> To: java-user@lucene.apache.org
> Subject: Analyzer
> 
> Hi,
> 
> In my work, I am using Lucene and two java classes. In the first one, I
> index a document and in the second one, I try to search the most relevant
> document for the indexed document in the first one. In the first java
> class,
> I use the SnowballAnalyzer in the createIndex method and StandardAnalyzer
> in
> the searchIndex method and pass the highest frequency terms into the
> second
> Java class. In the second class, I use SnowballAnalyzer in the createIndex
> method (this index is for the collection of documents to be searched, or
> it
> is my database) and StandardAnalyser in the searchIndex method (I pass the
> highest frequently occuring term of the first class as the search term
> parameter to the searchIndex method of the second class). Using Analyzers
> in
> this manner, what I am willing is to do the stemming, stop-words in both
> indexes (in both classes) and to search those a few high frequency words
> (of
> the first index) in the second index. So, if my intention is clear to you,
> could you please let me know whether it is correct or not the way I have
> used Analyzers? I highly appreciate any comment.
> 
> Thanx.
> Manjula.
Mime
View raw message