lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brisbart Franck <Franck.Brisb...@kelkoo.net>
Subject Re: score and frequency
Date Mon, 07 Jun 2004 07:20:08 GMT
It seems that you don't the length norm to be used. It's a factor which 
normalize the score of a doc depending on the size of the searched field 
of the doc. It's the field which make that 'ground ice' has a higher 
score than 'ice hockey: British Sekonda Superleague Play-Off 
Championship: finals' because it only has 2 terms.
So, I suggest you to override the lengthNorm method and to ignore the 
numTokens parameter.
NB: The length norm is computed during the indexation and the norm are 
store in the index (in the _aaa.f# files). So, you need to do re-index 
your data, and use this similarity during the indexation.

Cheers,
Franck


Niraj Alok wrote:
> I have set the searcher.setSimilarity  as well as also tried setting the
> coord factor to 1.
> 
> The problem as given by an example is : Lets say I have titles to be
> displayed depending upon the search.
> E.g if i have "ice hockey" as the search item and if it is default
> similarity, my results are :
> 
> ice hockey0.99999994
> ice hockey0.75
> ice hockey0.75
> winter Olympics: hockey, ice, medallists0.17402513
> ice age0.073680125
> National Hockey League0.020266924
> Cracking the Ice Age0.018420031
> ground-ice0.011512519
> ice hockey: British Sekonda Superleague Play-Off Championship:
> finals0.0069075115
>  (the numbers indicating the score).
> 
> 
> But if i set the similarity as my overridden one, the results become:
> ice hockey0.99999994
> ice hockey0.75
> ice hockey0.75
> ice age0.22104037
> winter Olympics: hockey, ice, medallists0.17402513
> National Hockey League0.060800765
> Cracking the Ice Age0.055260092
> ground-ice0.034537554
> ice hockey: British Sekonda Superleague Play-Off Championship:
> finals0.020722535
> 
> 
> I want all the titles which have both "ice" and "hockey" to come above the
> rest (to have higher scores)
> Meaning i would wish the results to appear like:
> 
> ice hockey
> ice hockey
> ice hockey
> winter Olympics: hockey, ice, medallists
> ice hockey: British Sekonda Superleague Play-Off Championship: finals
> ice age
> National Hockey League
> Cracking the Ice Age
> ground-ice
> 
> My overriden similarity class contains just this method:
> public float coord(int overlap, int maxOverlap) {
> 
> return 1.0f;
> 
> }
> 
> 
> 
> 
> 
> I feel it is the weight factor which is producing indesirable results. Any
> help in this regard would be highly appreciated.
> 
> Regards,
> Niraj
> 
> ----- Original Message -----
> From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Friday, June 04, 2004 8:46 PM
> Subject: Re: score and frequency
> 
> 
> 
>>Hi,
>>
>>Be careful to set the default similarity
>>'Similarity.setDefault(similarity)' before creating your search instance
>>(IndexSearcher).
>>If you change the default similarity after, you'll still use the old one.
>>You'd better use the 'searcher.setSimilarity' method on your searcher.
>>
>>Franck
>>
>>
>>Phil brunet wrote:
>>
>>>Hi to all.
>>>
>>>Maybe the term frequency is not the only parameter you need to override
>>>to "customize" the score attributed by Lucene.
>>>
>>>Maybe you should consider the normalisation factor, the idf and the
>>>coord factor ?
>>>
>>>Philippe
>>>
>>>
>>>>From: "Niraj Alok" <niraj@emacmillan.com>
>>>>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>Subject: Re: score and frequency
>>>>Date: Fri, 4 Jun 2004 15:13:32 +0530
>>>>
>>>>Hi Erik,
>>>>
>>>>Thanks for the suggestion.
>>>>
>>>>I tried this:
>>>>public class RelevanceSimilarity extends DefaultSimilarity
>>>>
>>>>{
>>>>
>>>>public float tf(float freq) {
>>>>
>>>>System.out.println("discounting frequency");
>>>>
>>>>return (float)1;
>>>>
>>>>}
>>>>
>>>>}
>>>>
>>>>
>>>>
>>>>and in my query class, I used :
>>>>
>>>>Similarity.setDefault(similarity);
>>>>
>>>>Hits hits = is.search(query);
>>>>
>>>>for(i = 0; i < hits.length(); i ++)
>>>>
>>>>  result = result + hits.score(i);
>>>>
>>>>
>>>>
>>>>However, this is still not giving me the expected result. Do I need to
> 
> do
> 
>>>>something else?
>>>>
>>>>
>>>>Regards,
>>>>Niraj
>>>>
>>>>----- Original Message -----
>>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>Sent: Friday, June 04, 2004 1:55 PM
>>>>Subject: Re: score and frequency
>>>>
>>>>
>>>>
>>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>I am having some problems with the score of lucene.
>>>>>>I am trying to get the results displayed according to hits.score
> 
> and
> 
>>>>>>it is giving the results correctly.
>>>>>>However I do not want the frequency factor to be used for the
>>>>>>computation of the score.
>>>>>>
>>>>>>Is it possible to get the score which does not have the frequency
>>>>>>factor in it ?
>>>>>
>>>>>Have a look at the javadocs for Similarity.  DefaultSimilarity is
> 
> used
> 
>>>>>unless otherwise specified.  You could subclass that and override
> 
> this:
> 
>>>>>   public float tf(float freq) {
>>>>>     return (float)Math.sqrt(freq);
>>>>>   }
>>>>>
>>>>>and return 1.0.  This might give you the effect you want.
>>>>>
>>>>>Erik
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>
>>>_________________________________________________________________
>>>Bloquez les fenĂȘtres pop-up, c'est gratuit ! http://toolbar.msn.fr
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>>--
>>Franck Brisbart
>>R&D
>>http://www.kelkoo.com
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


-- 
Franck Brisbart
R&D
http://www.kelkoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message