lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Extending the similarity class
Date Fri, 22 Jul 2005 16:34:11 GMT

On Jul 22, 2005, at 9:59 AM, Ahmed El-dawy wrote:

> Hello,
>   I am using lucene to search plain text, but the order of the search
> results is not satisfying to my needs. First, I want to know how the
> similarity works. Then, I need to extend it.

Use IndexSearcher.explain() to see how each individual hit is scored  
against a Query - this will be the clearest way to see why things  
score the way they do.

>   First, does the similarity class work on analyzed text or original
> search text? To be precise, does it count the stop words as found
> terms or not?

Only terms returned from the Analyzer are considered, so if a stop  
word is removed it does not count for tf or idf.

>   Second, I want to add a factor of how relative are the terms of the
> query found in text. For example, when I search for "quick fox", "fox
> quick" and "quick brown fox" will be less ranked than "quick fox".

This will happen automatically with PhraseQuery with a slop factor.   
The closer the words, the better the score.  However, with a pure  
boolean query, proximity is not considered at all (nor should it  
be).  You can use a large slop factor for phrases such as "quick  
fox"~100 and see how the scores work then.

     Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message