lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed El-dawy <aseld...@gmail.com>
Subject Re: Extending the similarity class
Date Sat, 23 Jul 2005 08:45:01 GMT
> Only terms returned from the Analyzer are considered, so if a stop
> word is removed it does not count for tf or idf.
But I need to compare according to non indexed words also. By the way,
goole does this.

> This will happen automatically with PhraseQuery with a slop factor.
> The closer the words, the better the score.  However, with a pure
> boolean query, proximity is not considered at all (nor should it
> be).  You can use a large slop factor for phrases such as "quick
> fox"~100 and see how the scores work then.
This means that all words must be in the result. This is not always
the case in my application. If I am searching for "quick brown fox",
"quick fox" is an acceptable result.
I just need to know whether I need to resort the search results
according to my criteria, or there are some methods to override which
will bring results already sorted.


On 7/22/05, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> 
> On Jul 22, 2005, at 9:59 AM, Ahmed El-dawy wrote:
> 
> > Hello,
> >   I am using lucene to search plain text, but the order of the search
> > results is not satisfying to my needs. First, I want to know how the
> > similarity works. Then, I need to extend it.
> 
> Use IndexSearcher.explain() to see how each individual hit is scored
> against a Query - this will be the clearest way to see why things
> score the way they do.
> 
> >   First, does the similarity class work on analyzed text or original
> > search text? To be precise, does it count the stop words as found
> > terms or not?
> 
> Only terms returned from the Analyzer are considered, so if a stop
> word is removed it does not count for tf or idf.
> 
> >   Second, I want to add a factor of how relative are the terms of the
> > query found in text. For example, when I search for "quick fox", "fox
> > quick" and "quick brown fox" will be less ranked than "quick fox".
> 
> This will happen automatically with PhraseQuery with a slop factor.
> The closer the words, the better the score.  However, with a pure
> boolean query, proximity is not considered at all (nor should it
> be).  You can use a large slop factor for phrases such as "quick
> fox"~100 and see how the scores work then.
> 
>     Erik
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 


-- 
Regards,
Ahmed Saad

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message