lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From manjula wijewickrema <manjul...@gmail.com>
Subject Re: Analyzer
Date Tue, 30 Nov 2010 05:06:33 GMT
Hi Steve,

Thanx a lot for your reply. Yes there are only two classes and it's corrcet
that the way you have realized the problem. As you have instructed, I
checked WhitespaceAnalyzer for querying (instead of StandardAnalyzer) and it
seems to me that it gives better results rather than StandardAnalyzer. So
could you please let me know what are the differences between
StandardAnalyzer and WhitespaceAnalyzer. I highly appriciate your response.
Thanx.

Manjula.


On Mon, Nov 29, 2010 at 7:32 PM, Steven A Rowe <sarowe@syr.edu> wrote:

> Hi Manjula,
>
> It's not terribly clear what you're doing here - I got lost in your
> description of your (two? or maybe four?) classes.  Sometimes things are
> easier to understand if you provide more concrete detail.
>
> I suspect that you could benefit from reading the book Lucene in Action,
> 2nd edition:
>
>   http://www.manning.com/hatcher3/
>
> You would also likely benefit from using Luke, the Lucene index browser, to
> better understand your indexes' contents and debug how queries match
> documents:
>
>   http://code.google.com/p/luke/
>
> I think your question is whether you're using Analyzers correctly.  It
> sounds like you are creating two separate indexes (one for each of your
> classes), and you're using SnowballAnalyzer on the indexing side for both
> indexes, and StandardAnalyzer on the query side.
>
> The usual advice is to use the same Analyzer on both the query and the
> index side.  But it appears to be the case that you are taking stemmed index
> terms from your index #1 and then querying index #2 using these stemmed
> terms.  If this is true, then you want the query-time analyzer in your
> second index not to change the query terms.  You'll likely get better
> results using WhitespaceAnalyzer, which tokenizes on whitespace and does no
> further analysis, rather than StandardAnalyzer.
>
> Steve
>
> > -----Original Message-----
> > From: manjula wijewickrema [mailto:manjula53@gmail.com]
> > Sent: Monday, November 29, 2010 4:32 AM
> > To: java-user@lucene.apache.org
> > Subject: Analyzer
> >
> > Hi,
> >
> > In my work, I am using Lucene and two java classes. In the first one, I
> > index a document and in the second one, I try to search the most relevant
> > document for the indexed document in the first one. In the first java
> > class,
> > I use the SnowballAnalyzer in the createIndex method and StandardAnalyzer
> > in
> > the searchIndex method and pass the highest frequency terms into the
> > second
> > Java class. In the second class, I use SnowballAnalyzer in the
> createIndex
> > method (this index is for the collection of documents to be searched, or
> > it
> > is my database) and StandardAnalyser in the searchIndex method (I pass
> the
> > highest frequently occuring term of the first class as the search term
> > parameter to the searchIndex method of the second class). Using Analyzers
> > in
> > this manner, what I am willing is to do the stemming, stop-words in both
> > indexes (in both classes) and to search those a few high frequency words
> > (of
> > the first index) in the second index. So, if my intention is clear to
> you,
> > could you please let me know whether it is correct or not the way I have
> > used Analyzers? I highly appreciate any comment.
> >
> > Thanx.
> > Manjula.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message