lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niraj Alok" <ni...@emacmillan.com>
Subject Re: score and frequency
Date Thu, 24 Jun 2004 08:36:00 GMT
I asked the previous question since I do not know how to use PhraseQuery

I have one booleanquery and one query.
The query is Query query =  MultiFieldQueryParser.parse( qs, searchLoc,
flags, new StandardAnalyzer(stop));

where qs is the word to be searched upon and searchLoc contains all the four
fields.

How do I insert a PhraseQuery here for title field only, and that too with
its boosted value?


Regards,
Niraj
----- Original Message -----
From: "Niraj Alok" <niraj@emacmillan.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, June 24, 2004 2:00 PM
Subject: Re: score and frequency


> Does it mean that I would need to abandon MultiFieldQueryParser?
>
> Regards,
> Niraj
> ----- Original Message -----
> From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Thursday, June 24, 2004 1:22 PM
> Subject: Re: score and frequency
>
>
> > Hi,
> > first, what do you consider as an 'exact matching' ? It seems that you
> > treat the search word by word, so 'lion sea' will be an 'exact match' of
> > 'sea-lion'.
> > I think you should add a PhraseQuery to your query containing the title
> > and with a big boost. So, you don't need to boost your title field. Only
> > the results matching exactly (for the PhraseQuery) will be boosted.
> >
> > Franck
> >
> >
> > Niraj Alok wrote:
> > > Hi Guys,
> > >
> > > I seem to have run into rough weather again.
> > > To describe the problem as concisely as possible, I have four fields
to
> search upon : title , first para, rest of the paras and content (equal to
> title + first para + rest of the para) .  I am doing this by using
> MultiFieldQueryParser.
> > >
> > > Now there is a very complicated ranking algrorithm specified by the
> client and I have met most of them except one or two and really need your
> help as all my other efforts have failed.
> > >
> > > The most important rule is that exact matching titles should come
first
> , i.e. get higher scores.
> > > I have given the highest boost factor to the title than the rest but
the
> problem comes up when there is some other title which has got just one
word
> matching. For e.g., if I search for lion, there is a title sea-lion which
> also has the same boost factor as that of "lion" in the index. Also,
> sea-lion has got some more "lion" in its first para or rest of the paras
> etc. such that its score comes higher than "lion".
> > >
> > > Is there some way to get the exact matching titles higher scores?
> > > Please reply soon.
> > >
> > >
> > > Regards,
> > > Niraj
> > >
> > >
> > > ----- Original Message -----
> > > From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> > > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > > Sent: Monday, June 07, 2004 12:50 PM
> > > Subject: Re: score and frequency
> > >
> > >
> > >
> > >>It seems that you don't the length norm to be used. It's a factor
which
> > >>normalize the score of a doc depending on the size of the searched
field
> > >>of the doc. It's the field which make that 'ground ice' has a higher
> > >>score than 'ice hockey: British Sekonda Superleague Play-Off
> > >>Championship: finals' because it only has 2 terms.
> > >>So, I suggest you to override the lengthNorm method and to ignore the
> > >>numTokens parameter.
> > >>NB: The length norm is computed during the indexation and the norm are
> > >>store in the index (in the _aaa.f# files). So, you need to do re-index
> > >>your data, and use this similarity during the indexation.
> > >>
> > >>Cheers,
> > >>Franck
> > >>
> > >>
> > >>Niraj Alok wrote:
> > >>
> > >>>I have set the searcher.setSimilarity  as well as also tried setting
> the
> > >>>coord factor to 1.
> > >>>
> > >>>The problem as given by an example is : Lets say I have titles to be
> > >>>displayed depending upon the search.
> > >>>E.g if i have "ice hockey" as the search item and if it is default
> > >>>similarity, my results are :
> > >>>
> > >>>ice hockey0.99999994
> > >>>ice hockey0.75
> > >>>ice hockey0.75
> > >>>winter Olympics: hockey, ice, medallists0.17402513
> > >>>ice age0.073680125
> > >>>National Hockey League0.020266924
> > >>>Cracking the Ice Age0.018420031
> > >>>ground-ice0.011512519
> > >>>ice hockey: British Sekonda Superleague Play-Off Championship:
> > >>>finals0.0069075115
> > >>> (the numbers indicating the score).
> > >>>
> > >>>
> > >>>But if i set the similarity as my overridden one, the results become:
> > >>>ice hockey0.99999994
> > >>>ice hockey0.75
> > >>>ice hockey0.75
> > >>>ice age0.22104037
> > >>>winter Olympics: hockey, ice, medallists0.17402513
> > >>>National Hockey League0.060800765
> > >>>Cracking the Ice Age0.055260092
> > >>>ground-ice0.034537554
> > >>>ice hockey: British Sekonda Superleague Play-Off Championship:
> > >>>finals0.020722535
> > >>>
> > >>>
> > >>>I want all the titles which have both "ice" and "hockey" to come
above
> the
> > >>>rest (to have higher scores)
> > >>>Meaning i would wish the results to appear like:
> > >>>
> > >>>ice hockey
> > >>>ice hockey
> > >>>ice hockey
> > >>>winter Olympics: hockey, ice, medallists
> > >>>ice hockey: British Sekonda Superleague Play-Off Championship: finals
> > >>>ice age
> > >>>National Hockey League
> > >>>Cracking the Ice Age
> > >>>ground-ice
> > >>>
> > >>>My overriden similarity class contains just this method:
> > >>>public float coord(int overlap, int maxOverlap) {
> > >>>
> > >>>return 1.0f;
> > >>>
> > >>>}
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>I feel it is the weight factor which is producing indesirable
results.
> Any
> > >>>help in this regard would be highly appreciated.
> > >>>
> > >>>Regards,
> > >>>Niraj
> > >>>
> > >>>----- Original Message -----
> > >>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> > >>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > >>>Sent: Friday, June 04, 2004 8:46 PM
> > >>>Subject: Re: score and frequency
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>Hi,
> > >>>>
> > >>>>Be careful to set the default similarity
> > >>>>'Similarity.setDefault(similarity)' before creating your search
> instance
> > >>>>(IndexSearcher).
> > >>>>If you change the default similarity after, you'll still use the
old
> one.
> > >>>>You'd better use the 'searcher.setSimilarity' method on your
searcher.
> > >>>>
> > >>>>Franck
> > >>>>
> > >>>>
> > >>>>Phil brunet wrote:
> > >>>>
> > >>>>
> > >>>>>Hi to all.
> > >>>>>
> > >>>>>Maybe the term frequency is not the only parameter you need
to
> override
> > >>>>>to "customize" the score attributed by Lucene.
> > >>>>>
> > >>>>>Maybe you should consider the normalisation factor, the idf
and the
> > >>>>>coord factor ?
> > >>>>>
> > >>>>>Philippe
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>>From: "Niraj Alok" <niraj@emacmillan.com>
> > >>>>>>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > >>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > >>>>>>Subject: Re: score and frequency
> > >>>>>>Date: Fri, 4 Jun 2004 15:13:32 +0530
> > >>>>>>
> > >>>>>>Hi Erik,
> > >>>>>>
> > >>>>>>Thanks for the suggestion.
> > >>>>>>
> > >>>>>>I tried this:
> > >>>>>>public class RelevanceSimilarity extends DefaultSimilarity
> > >>>>>>
> > >>>>>>{
> > >>>>>>
> > >>>>>>public float tf(float freq) {
> > >>>>>>
> > >>>>>>System.out.println("discounting frequency");
> > >>>>>>
> > >>>>>>return (float)1;
> > >>>>>>
> > >>>>>>}
> > >>>>>>
> > >>>>>>}
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>and in my query class, I used :
> > >>>>>>
> > >>>>>>Similarity.setDefault(similarity);
> > >>>>>>
> > >>>>>>Hits hits = is.search(query);
> > >>>>>>
> > >>>>>>for(i = 0; i < hits.length(); i ++)
> > >>>>>>
> > >>>>>> result = result + hits.score(i);
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>However, this is still not giving me the expected result.
Do I
need
> to
> > >>>
> > >>>do
> > >>>
> > >>>
> > >>>>>>something else?
> > >>>>>>
> > >>>>>>
> > >>>>>>Regards,
> > >>>>>>Niraj
> > >>>>>>
> > >>>>>>----- Original Message -----
> > >>>>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
> > >>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > >>>>>>Sent: Friday, June 04, 2004 1:55 PM
> > >>>>>>Subject: Re: score and frequency
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>Hi,
> > >>>>>>>>
> > >>>>>>>>I am having some problems with the score of lucene.
> > >>>>>>>>I am trying to get the results displayed according
to hits.score
> > >>>
> > >>>and
> > >>>
> > >>>
> > >>>>>>>>it is giving the results correctly.
> > >>>>>>>>However I do not want the frequency factor to be
used for the
> > >>>>>>>>computation of the score.
> > >>>>>>>>
> > >>>>>>>>Is it possible to get the score which does not have
the
frequency
> > >>>>>>>>factor in it ?
> > >>>>>>>
> > >>>>>>>Have a look at the javadocs for Similarity.  DefaultSimilarity
is
> > >>>
> > >>>used
> > >>>
> > >>>
> > >>>>>>>unless otherwise specified.  You could subclass that
and override
> > >>>
> > >>>this:
> > >>>
> > >>>
> > >>>>>>>  public float tf(float freq) {
> > >>>>>>>    return (float)Math.sqrt(freq);
> > >>>>>>>  }
> > >>>>>>>
> > >>>>>>>and return 1.0.  This might give you the effect you
want.
> > >>>>>>>
> > >>>>>>>Erik
> > >>>>>>>
> > >>>>>>>
> >
>
>>>>>>>---------------------------------------------------------------------
> > >>>>>>>To unsubscribe, e-mail:
lucene-user-unsubscribe@jakarta.apache.org
> > >>>>>>>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> >
>
>>>>>>---------------------------------------------------------------------
> > >>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > >>>>>>For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
> > >>>>>>
> > >>>>>
> > >>>>>_________________________________________________________________
> > >>>>>Bloquez les fenĂȘtres pop-up, c'est gratuit ! http://toolbar.msn.fr
> > >>>>>
> > >>>>>
> >
>>>>>---------------------------------------------------------------------
> > >>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > >>>>>For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
> > >>>>>
> > >>>>
> > >>>>
> > >>>>--
> > >>>>Franck Brisbart
> > >>>>R&D
> > >>>>http://www.kelkoo.com
> > >>>>
> > >>>>
> >
>>>>---------------------------------------------------------------------
> > >>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > >>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>---------------------------------------------------------------------
> > >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >>>
> > >>
> > >>
> > >>--
> > >>Franck Brisbart
> > >>R&D
> > >>http://www.kelkoo.com
> > >>
> > >>
> > >>---------------------------------------------------------------------
> > >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> > >
> >
> >
> > --
> > Franck Brisbart
> > R&D
> > http://www.kelkoo.com
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message