lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niraj Alok" <ni...@emacmillan.com>
Subject Re: score and frequency
Date Mon, 14 Jun 2004 07:47:59 GMT
Hi,

I am facing a peculiar problem with the changes mentioned here.
In the explanation of the score, I am finding the following products. I am
also mentioning their lucene calculated scores alongside.

search result                                                     product
from explain             hits.score
1)ice hockey
1.23028304E8                     1
2)ice hockey
8.3813824E7                     0.6812565
3)ice hockey
8.3813824E7                     0.6812565
4)winter Olympics&colon; hockey, ice, medallists  5.2345928E7
0.4254788
5)ice age
8960174.0                        0.05670582
6) National Hockey League                                   3.3130242E7
0.039581057
7) ice hockey&colon; British Sekonda Superleague 4.0383384E7
0.0372818
        Play&hyphen;Off Championship&colon; finals


As you can notice the product for term 7 is more than that of 5. Still its
score is coming lower.
Can anyone please suggest why this discrepancy is coming in the scores?
Also, is it possible to directly use the product from explain itself instead
of the hits.score? If I can somehow use the product, it will solve all my
problems.

Regards,
Niraj

----- Original Message -----
From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, June 07, 2004 12:50 PM
Subject: Re: score and frequency


> It seems that you don't the length norm to be used. It's a factor which
> normalize the score of a doc depending on the size of the searched field
> of the doc. It's the field which make that 'ground ice' has a higher
> score than 'ice hockey: British Sekonda Superleague Play-Off
> Championship: finals' because it only has 2 terms.
> So, I suggest you to override the lengthNorm method and to ignore the
> numTokens parameter.
> NB: The length norm is computed during the indexation and the norm are
> store in the index (in the _aaa.f# files). So, you need to do re-index
> your data, and use this similarity during the indexation.
>
> Cheers,
> Franck
>
>
> Niraj Alok wrote:
> > I have set the searcher.setSimilarity  as well as also tried setting the
> > coord factor to 1.
> >
> > The problem as given by an example is : Lets say I have titles to be
> > displayed depending upon the search.
> > E.g if i have "ice hockey" as the search item and if it is default
> > similarity, my results are :
> >
> > ice hockey0.99999994
> > ice hockey0.75
> > ice hockey0.75
> > winter Olympics: hockey, ice, medallists0.17402513
> > ice age0.073680125
> > National Hockey League0.020266924
> > Cracking the Ice Age0.018420031
> > ground-ice0.011512519
> > ice hockey: British Sekonda Superleague Play-Off Championship:
> > finals0.0069075115
> >  (the numbers indicating the score).
> >
> >
> > But if i set the similarity as my overridden one, the results become:
> > ice hockey0.99999994
> > ice hockey0.75
> > ice hockey0.75
> > ice age0.22104037
> > winter Olympics: hockey, ice, medallists0.17402513
> > National Hockey League0.060800765
> > Cracking the Ice Age0.055260092
> > ground-ice0.034537554
> > ice hockey: British Sekonda Superleague Play-Off Championship:
> > finals0.020722535
> >
> >
> > I want all the titles which have both "ice" and "hockey" to come above
the
> > rest (to have higher scores)
> > Meaning i would wish the results to appear like:
> >
> > ice hockey
> > ice hockey
> > ice hockey
> > winter Olympics: hockey, ice, medallists
> > ice hockey: British Sekonda Superleague Play-Off Championship: finals
> > ice age
> > National Hockey League
> > Cracking the Ice Age
> > ground-ice
> >
> > My overriden similarity class contains just this method:
> > public float coord(int overlap, int maxOverlap) {
> >
> > return 1.0f;
> >
> > }
> >
> >
> >
> >
> >
> > I feel it is the weight factor which is producing indesirable results.
Any
> > help in this regard would be highly appreciated.
> >
> > Regards,
> > Niraj
> >
> > ----- Original Message -----
> > From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Friday, June 04, 2004 8:46 PM
> > Subject: Re: score and frequency
> >
> >
> >
> >>Hi,
> >>
> >>Be careful to set the default similarity
> >>'Similarity.setDefault(similarity)' before creating your search instance
> >>(IndexSearcher).
> >>If you change the default similarity after, you'll still use the old
one.
> >>You'd better use the 'searcher.setSimilarity' method on your searcher.
> >>
> >>Franck
> >>
> >>
> >>Phil brunet wrote:
> >>
> >>>Hi to all.
> >>>
> >>>Maybe the term frequency is not the only parameter you need to override
> >>>to "customize" the score attributed by Lucene.
> >>>
> >>>Maybe you should consider the normalisation factor, the idf and the
> >>>coord factor ?
> >>>
> >>>Philippe
> >>>
> >>>
> >>>>From: "Niraj Alok" <niraj@emacmillan.com>
> >>>>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>Subject: Re: score and frequency
> >>>>Date: Fri, 4 Jun 2004 15:13:32 +0530
> >>>>
> >>>>Hi Erik,
> >>>>
> >>>>Thanks for the suggestion.
> >>>>
> >>>>I tried this:
> >>>>public class RelevanceSimilarity extends DefaultSimilarity
> >>>>
> >>>>{
> >>>>
> >>>>public float tf(float freq) {
> >>>>
> >>>>System.out.println("discounting frequency");
> >>>>
> >>>>return (float)1;
> >>>>
> >>>>}
> >>>>
> >>>>}
> >>>>
> >>>>
> >>>>
> >>>>and in my query class, I used :
> >>>>
> >>>>Similarity.setDefault(similarity);
> >>>>
> >>>>Hits hits = is.search(query);
> >>>>
> >>>>for(i = 0; i < hits.length(); i ++)
> >>>>
> >>>>  result = result + hits.score(i);
> >>>>
> >>>>
> >>>>
> >>>>However, this is still not giving me the expected result. Do I need to
> >
> > do
> >
> >>>>something else?
> >>>>
> >>>>
> >>>>Regards,
> >>>>Niraj
> >>>>
> >>>>----- Original Message -----
> >>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
> >>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>Sent: Friday, June 04, 2004 1:55 PM
> >>>>Subject: Re: score and frequency
> >>>>
> >>>>
> >>>>
> >>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
> >>>>>
> >>>>>>Hi,
> >>>>>>
> >>>>>>I am having some problems with the score of lucene.
> >>>>>>I am trying to get the results displayed according to hits.score
> >
> > and
> >
> >>>>>>it is giving the results correctly.
> >>>>>>However I do not want the frequency factor to be used for the
> >>>>>>computation of the score.
> >>>>>>
> >>>>>>Is it possible to get the score which does not have the frequency
> >>>>>>factor in it ?
> >>>>>
> >>>>>Have a look at the javadocs for Similarity.  DefaultSimilarity is
> >
> > used
> >
> >>>>>unless otherwise specified.  You could subclass that and override
> >
> > this:
> >
> >>>>>   public float tf(float freq) {
> >>>>>     return (float)Math.sqrt(freq);
> >>>>>   }
> >>>>>
> >>>>>and return 1.0.  This might give you the effect you want.
> >>>>>
> >>>>>Erik
> >>>>>
> >>>>>
> >>>>>---------------------------------------------------------------------
> >>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>---------------------------------------------------------------------
> >>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>
> >>>
> >>>_________________________________________________________________
> >>>Bloquez les fenĂȘtres pop-up, c'est gratuit ! http://toolbar.msn.fr
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>
> >>
> >>
> >>--
> >>Franck Brisbart
> >>R&D
> >>http://www.kelkoo.com
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> --
> Franck Brisbart
> R&D
> http://www.kelkoo.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message