lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niraj Alok" <ni...@emacmillan.com>
Subject Re: score and frequency
Date Mon, 07 Jun 2004 16:30:47 GMT
Hi Frank,

Thanks for the suggestion. I would try that and let you know.

Regards,
Niraj

----- Original Message -----
From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, June 07, 2004 12:50 PM
Subject: Re: score and frequency


> It seems that you don't the length norm to be used. It's a factor which
> normalize the score of a doc depending on the size of the searched field
> of the doc. It's the field which make that 'ground ice' has a higher
> score than 'ice hockey: British Sekonda Superleague Play-Off
> Championship: finals' because it only has 2 terms.
> So, I suggest you to override the lengthNorm method and to ignore the
> numTokens parameter.
> NB: The length norm is computed during the indexation and the norm are
> store in the index (in the _aaa.f# files). So, you need to do re-index
> your data, and use this similarity during the indexation.
>
> Cheers,
> Franck
>
>
> Niraj Alok wrote:
> > I have set the searcher.setSimilarity  as well as also tried setting the
> > coord factor to 1.
> >
> > The problem as given by an example is : Lets say I have titles to be
> > displayed depending upon the search.
> > E.g if i have "ice hockey" as the search item and if it is default
> > similarity, my results are :
> >
> > ice hockey0.99999994
> > ice hockey0.75
> > ice hockey0.75
> > winter Olympics: hockey, ice, medallists0.17402513
> > ice age0.073680125
> > National Hockey League0.020266924
> > Cracking the Ice Age0.018420031
> > ground-ice0.011512519
> > ice hockey: British Sekonda Superleague Play-Off Championship:
> > finals0.0069075115
> >  (the numbers indicating the score).
> >
> >
> > But if i set the similarity as my overridden one, the results become:
> > ice hockey0.99999994
> > ice hockey0.75
> > ice hockey0.75
> > ice age0.22104037
> > winter Olympics: hockey, ice, medallists0.17402513
> > National Hockey League0.060800765
> > Cracking the Ice Age0.055260092
> > ground-ice0.034537554
> > ice hockey: British Sekonda Superleague Play-Off Championship:
> > finals0.020722535
> >
> >
> > I want all the titles which have both "ice" and "hockey" to come above
the
> > rest (to have higher scores)
> > Meaning i would wish the results to appear like:
> >
> > ice hockey
> > ice hockey
> > ice hockey
> > winter Olympics: hockey, ice, medallists
> > ice hockey: British Sekonda Superleague Play-Off Championship: finals
> > ice age
> > National Hockey League
> > Cracking the Ice Age
> > ground-ice
> >
> > My overriden similarity class contains just this method:
> > public float coord(int overlap, int maxOverlap) {
> >
> > return 1.0f;
> >
> > }
> >
> >
> >
> >
> >
> > I feel it is the weight factor which is producing indesirable results.
Any
> > help in this regard would be highly appreciated.
> >
> > Regards,
> > Niraj
> >
> > ----- Original Message -----
> > From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Friday, June 04, 2004 8:46 PM
> > Subject: Re: score and frequency
> >
> >
> >
> >>Hi,
> >>
> >>Be careful to set the default similarity
> >>'Similarity.setDefault(similarity)' before creating your search instance
> >>(IndexSearcher).
> >>If you change the default similarity after, you'll still use the old
one.
> >>You'd better use the 'searcher.setSimilarity' method on your searcher.
> >>
> >>Franck
> >>
> >>
> >>Phil brunet wrote:
> >>
> >>>Hi to all.
> >>>
> >>>Maybe the term frequency is not the only parameter you need to override
> >>>to "customize" the score attributed by Lucene.
> >>>
> >>>Maybe you should consider the normalisation factor, the idf and the
> >>>coord factor ?
> >>>
> >>>Philippe
> >>>
> >>>
> >>>>From: "Niraj Alok" <niraj@emacmillan.com>
> >>>>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>Subject: Re: score and frequency
> >>>>Date: Fri, 4 Jun 2004 15:13:32 +0530
> >>>>
> >>>>Hi Erik,
> >>>>
> >>>>Thanks for the suggestion.
> >>>>
> >>>>I tried this:
> >>>>public class RelevanceSimilarity extends DefaultSimilarity
> >>>>
> >>>>{
> >>>>
> >>>>public float tf(float freq) {
> >>>>
> >>>>System.out.println("discounting frequency");
> >>>>
> >>>>return (float)1;
> >>>>
> >>>>}
> >>>>
> >>>>}
> >>>>
> >>>>
> >>>>
> >>>>and in my query class, I used :
> >>>>
> >>>>Similarity.setDefault(similarity);
> >>>>
> >>>>Hits hits = is.search(query);
> >>>>
> >>>>for(i = 0; i < hits.length(); i ++)
> >>>>
> >>>>  result = result + hits.score(i);
> >>>>
> >>>>
> >>>>
> >>>>However, this is still not giving me the expected result. Do I need to
> >
> > do
> >
> >>>>something else?
> >>>>
> >>>>
> >>>>Regards,
> >>>>Niraj
> >>>>
> >>>>----- Original Message -----
> >>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
> >>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>Sent: Friday, June 04, 2004 1:55 PM
> >>>>Subject: Re: score and frequency
> >>>>
> >>>>
> >>>>
> >>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
> >>>>>
> >>>>>>Hi,
> >>>>>>
> >>>>>>I am having some problems with the score of lucene.
> >>>>>>I am trying to get the results displayed according to hits.score
> >
> > and
> >
> >>>>>>it is giving the results correctly.
> >>>>>>However I do not want the frequency factor to be used for the
> >>>>>>computation of the score.
> >>>>>>
> >>>>>>Is it possible to get the score which does not have the frequency
> >>>>>>factor in it ?
> >>>>>
> >>>>>Have a look at the javadocs for Similarity.  DefaultSimilarity is
> >
> > used
> >
> >>>>>unless otherwise specified.  You could subclass that and override
> >
> > this:
> >
> >>>>>   public float tf(float freq) {
> >>>>>     return (float)Math.sqrt(freq);
> >>>>>   }
> >>>>>
> >>>>>and return 1.0.  This might give you the effect you want.
> >>>>>
> >>>>>Erik
> >>>>>
> >>>>>
> >>>>>---------------------------------------------------------------------
> >>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>---------------------------------------------------------------------
> >>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>
> >>>
> >>>_________________________________________________________________
> >>>Bloquez les fenĂȘtres pop-up, c'est gratuit ! http://toolbar.msn.fr
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>
> >>
> >>
> >>--
> >>Franck Brisbart
> >>R&D
> >>http://www.kelkoo.com
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> --
> Franck Brisbart
> R&D
> http://www.kelkoo.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message