lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niraj Alok" <ni...@emacmillan.com>
Subject Re: score and frequency
Date Thu, 24 Jun 2004 12:15:31 GMT
Hi Franck,

Thank you so much for the detailed explanation.
However, when I tried to break up my MultiFieldQueryParser into a series of
BooleanQueries, the result set has got reduced drastically.
Any idea why this could be happening?

Regards,
Niraj
----- Original Message -----
From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, June 24, 2004 2:54 PM
Subject: Re: score and frequency


> The MultiFieldQueryParser give you a BooleanQuery containing 1 query for
> each field.
> Something like:
>             BooleanQuery
>             /   |   |   \
>           QF1  QF2 QF3  QF4    (QFx=Query for field x)
>
> You can still use the MultiFieldQueryParser and create a BooleanQuery to
> encapsulate the one parsed + the PhraseQuery, ie:
>              BooleanQuery(created by you)
>               /       \
>             BQ      PhraseQuery
>
> Or create the whole query (I think you should do that) and have
> something like that:
>              _BooleanQuery__
>             /   |   |   \   \
>           QF1  QF2 QF3  QF4  PhraseQuery      (QFx=Query for field x)
>
> It's like parsing the following query:
> (field1:query) (field2:query) (field3:query)...(fieldx:query)
> (title:"query")~boost
>
>
> Franck
>
>
> Niraj Alok wrote:
> > I asked the previous question since I do not know how to use PhraseQuery
> >
> > I have one booleanquery and one query.
> > The query is Query query =  MultiFieldQueryParser.parse( qs, searchLoc,
> > flags, new StandardAnalyzer(stop));
> >
> > where qs is the word to be searched upon and searchLoc contains all the
four
> > fields.
> >
> > How do I insert a PhraseQuery here for title field only, and that too
with
> > its boosted value?
> >
> >
> > Regards,
> > Niraj
> > ----- Original Message -----
> > From: "Niraj Alok" <niraj@emacmillan.com>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Thursday, June 24, 2004 2:00 PM
> > Subject: Re: score and frequency
> >
> >
> >
> >>Does it mean that I would need to abandon MultiFieldQueryParser?
> >>
> >>Regards,
> >>Niraj
> >>----- Original Message -----
> >>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>Sent: Thursday, June 24, 2004 1:22 PM
> >>Subject: Re: score and frequency
> >>
> >>
> >>
> >>>Hi,
> >>>first, what do you consider as an 'exact matching' ? It seems that you
> >>>treat the search word by word, so 'lion sea' will be an 'exact match'
of
> >>>'sea-lion'.
> >>>I think you should add a PhraseQuery to your query containing the title
> >>>and with a big boost. So, you don't need to boost your title field.
Only
> >>>the results matching exactly (for the PhraseQuery) will be boosted.
> >>>
> >>>Franck
> >>>
> >>>
> >>>Niraj Alok wrote:
> >>>
> >>>>Hi Guys,
> >>>>
> >>>>I seem to have run into rough weather again.
> >>>>To describe the problem as concisely as possible, I have four fields
> >
> > to
> >
> >>search upon : title , first para, rest of the paras and content (equal
to
> >>title + first para + rest of the para) .  I am doing this by using
> >>MultiFieldQueryParser.
> >>
> >>>>Now there is a very complicated ranking algrorithm specified by the
> >>
> >>client and I have met most of them except one or two and really need
your
> >>help as all my other efforts have failed.
> >>
> >>>>The most important rule is that exact matching titles should come
> >
> > first
> >
> >>, i.e. get higher scores.
> >>
> >>>>I have given the highest boost factor to the title than the rest but
> >
> > the
> >
> >>problem comes up when there is some other title which has got just one
> >
> > word
> >
> >>matching. For e.g., if I search for lion, there is a title sea-lion
which
> >>also has the same boost factor as that of "lion" in the index. Also,
> >>sea-lion has got some more "lion" in its first para or rest of the paras
> >>etc. such that its score comes higher than "lion".
> >>
> >>>>Is there some way to get the exact matching titles higher scores?
> >>>>Please reply soon.
> >>>>
> >>>>
> >>>>Regards,
> >>>>Niraj
> >>>>
> >>>>
> >>>>----- Original Message -----
> >>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>Sent: Monday, June 07, 2004 12:50 PM
> >>>>Subject: Re: score and frequency
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>It seems that you don't the length norm to be used. It's a factor
> >
> > which
> >
> >>>>>normalize the score of a doc depending on the size of the searched
> >
> > field
> >
> >>>>>of the doc. It's the field which make that 'ground ice' has a higher
> >>>>>score than 'ice hockey: British Sekonda Superleague Play-Off
> >>>>>Championship: finals' because it only has 2 terms.
> >>>>>So, I suggest you to override the lengthNorm method and to ignore
the
> >>>>>numTokens parameter.
> >>>>>NB: The length norm is computed during the indexation and the norm
are
> >>>>>store in the index (in the _aaa.f# files). So, you need to do
re-index
> >>>>>your data, and use this similarity during the indexation.
> >>>>>
> >>>>>Cheers,
> >>>>>Franck
> >>>>>
> >>>>>
> >>>>>Niraj Alok wrote:
> >>>>>
> >>>>>
> >>>>>>I have set the searcher.setSimilarity  as well as also tried
setting
> >>
> >>the
> >>
> >>>>>>coord factor to 1.
> >>>>>>
> >>>>>>The problem as given by an example is : Lets say I have titles
to be
> >>>>>>displayed depending upon the search.
> >>>>>>E.g if i have "ice hockey" as the search item and if it is default
> >>>>>>similarity, my results are :
> >>>>>>
> >>>>>>ice hockey0.99999994
> >>>>>>ice hockey0.75
> >>>>>>ice hockey0.75
> >>>>>>winter Olympics: hockey, ice, medallists0.17402513
> >>>>>>ice age0.073680125
> >>>>>>National Hockey League0.020266924
> >>>>>>Cracking the Ice Age0.018420031
> >>>>>>ground-ice0.011512519
> >>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
> >>>>>>finals0.0069075115
> >>>>>>(the numbers indicating the score).
> >>>>>>
> >>>>>>
> >>>>>>But if i set the similarity as my overridden one, the results
become:
> >>>>>>ice hockey0.99999994
> >>>>>>ice hockey0.75
> >>>>>>ice hockey0.75
> >>>>>>ice age0.22104037
> >>>>>>winter Olympics: hockey, ice, medallists0.17402513
> >>>>>>National Hockey League0.060800765
> >>>>>>Cracking the Ice Age0.055260092
> >>>>>>ground-ice0.034537554
> >>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
> >>>>>>finals0.020722535
> >>>>>>
> >>>>>>
> >>>>>>I want all the titles which have both "ice" and "hockey" to come
> >
> > above
> >
> >>the
> >>
> >>>>>>rest (to have higher scores)
> >>>>>>Meaning i would wish the results to appear like:
> >>>>>>
> >>>>>>ice hockey
> >>>>>>ice hockey
> >>>>>>ice hockey
> >>>>>>winter Olympics: hockey, ice, medallists
> >>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
finals
> >>>>>>ice age
> >>>>>>National Hockey League
> >>>>>>Cracking the Ice Age
> >>>>>>ground-ice
> >>>>>>
> >>>>>>My overriden similarity class contains just this method:
> >>>>>>public float coord(int overlap, int maxOverlap) {
> >>>>>>
> >>>>>>return 1.0f;
> >>>>>>
> >>>>>>}
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>I feel it is the weight factor which is producing indesirable
> >
> > results.
> >
> >>Any
> >>
> >>>>>>help in this regard would be highly appreciated.
> >>>>>>
> >>>>>>Regards,
> >>>>>>Niraj
> >>>>>>
> >>>>>>----- Original Message -----
> >>>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> >>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>Sent: Friday, June 04, 2004 8:46 PM
> >>>>>>Subject: Re: score and frequency
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Hi,
> >>>>>>>
> >>>>>>>Be careful to set the default similarity
> >>>>>>>'Similarity.setDefault(similarity)' before creating your
search
> >>
> >>instance
> >>
> >>>>>>>(IndexSearcher).
> >>>>>>>If you change the default similarity after, you'll still
use the
old
> >>
> >>one.
> >>
> >>>>>>>You'd better use the 'searcher.setSimilarity' method on your
> >
> > searcher.
> >
> >>>>>>>Franck
> >>>>>>>
> >>>>>>>
> >>>>>>>Phil brunet wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>Hi to all.
> >>>>>>>>
> >>>>>>>>Maybe the term frequency is not the only parameter you
need to
> >>
> >>override
> >>
> >>>>>>>>to "customize" the score attributed by Lucene.
> >>>>>>>>
> >>>>>>>>Maybe you should consider the normalisation factor, the
idf and
the
> >>>>>>>>coord factor ?
> >>>>>>>>
> >>>>>>>>Philippe
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>From: "Niraj Alok" <niraj@emacmillan.com>
> >>>>>>>>>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>>Subject: Re: score and frequency
> >>>>>>>>>Date: Fri, 4 Jun 2004 15:13:32 +0530
> >>>>>>>>>
> >>>>>>>>>Hi Erik,
> >>>>>>>>>
> >>>>>>>>>Thanks for the suggestion.
> >>>>>>>>>
> >>>>>>>>>I tried this:
> >>>>>>>>>public class RelevanceSimilarity extends DefaultSimilarity
> >>>>>>>>>
> >>>>>>>>>{
> >>>>>>>>>
> >>>>>>>>>public float tf(float freq) {
> >>>>>>>>>
> >>>>>>>>>System.out.println("discounting frequency");
> >>>>>>>>>
> >>>>>>>>>return (float)1;
> >>>>>>>>>
> >>>>>>>>>}
> >>>>>>>>>
> >>>>>>>>>}
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>and in my query class, I used :
> >>>>>>>>>
> >>>>>>>>>Similarity.setDefault(similarity);
> >>>>>>>>>
> >>>>>>>>>Hits hits = is.search(query);
> >>>>>>>>>
> >>>>>>>>>for(i = 0; i < hits.length(); i ++)
> >>>>>>>>>
> >>>>>>>>>result = result + hits.score(i);
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>However, this is still not giving me the expected
result. Do I
> >
> > need
> >
> >>to
> >>
> >>>>>>do
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>>>something else?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>Regards,
> >>>>>>>>>Niraj
> >>>>>>>>>
> >>>>>>>>>----- Original Message -----
> >>>>>>>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
> >>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>>>>>>Sent: Friday, June 04, 2004 1:55 PM
> >>>>>>>>>Subject: Re: score and frequency
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>Hi,
> >>>>>>>>>>>
> >>>>>>>>>>>I am having some problems with the score
of lucene.
> >>>>>>>>>>>I am trying to get the results displayed
according to
hits.score
> >>>>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>>>>>it is giving the results correctly.
> >>>>>>>>>>>However I do not want the frequency factor
to be used for the
> >>>>>>>>>>>computation of the score.
> >>>>>>>>>>>
> >>>>>>>>>>>Is it possible to get the score which does
not have the
> >
> > frequency
> >
> >>>>>>>>>>>factor in it ?
> >>>>>>>>>>
> >>>>>>>>>>Have a look at the javadocs for Similarity. 
DefaultSimilarity
is
> >>>>>>
> >>>>>>used
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>>>>unless otherwise specified.  You could subclass
that and
override
> >>>>>>
> >>>>>>this:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>>>> public float tf(float freq) {
> >>>>>>>>>>   return (float)Math.sqrt(freq);
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>>and return 1.0.  This might give you the effect
you want.
> >>>>>>>>>>
> >>>>>>>>>>Erik
> >>>>>>>>>>
> >>>>>>>>>>
> >>>
>
>>>>>>>>--------------------------------------------------------------------
-
> >>>>>>>>
> >>>>>>>>>>To unsubscribe, e-mail:
> >
> > lucene-user-unsubscribe@jakarta.apache.org
> >
> >>>>>>>>>>For additional commands, e-mail:
> >>
> >>lucene-user-help@jakarta.apache.org
> >>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
>
>>>>>>>---------------------------------------------------------------------
> >>>>>>>
> >>>>>>>>>To unsubscribe, e-mail:
lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>>>>For additional commands, e-mail:
> >
> > lucene-user-help@jakarta.apache.org
> >
> >>>>>>>>_________________________________________________________________
> >>>>>>>>Bloquez les fenĂȘtres pop-up, c'est gratuit ! http://toolbar.msn.fr
> >>>>>>>>
> >>>>>>>>
> >>>
>
>>>>>>---------------------------------------------------------------------
> >>>>>>
> >>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>>>For additional commands, e-mail:
> >
> > lucene-user-help@jakarta.apache.org
> >
> >>>>>>>
> >>>>>>>--
> >>>>>>>Franck Brisbart
> >>>>>>>R&D
> >>>>>>>http://www.kelkoo.com
> >>>>>>>
> >>>>>>>
> >>>
> >>>>>---------------------------------------------------------------------
> >>>>>
> >>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>>For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
>
>>>>>>---------------------------------------------------------------------
> >>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>>--
> >>>>>Franck Brisbart
> >>>>>R&D
> >>>>>http://www.kelkoo.com
> >>>>>
> >>>>>
> >>>>>---------------------------------------------------------------------
> >>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>
> >>>>
> >>>>
> >>>
> >>>--
> >>>Franck Brisbart
> >>>R&D
> >>>http://www.kelkoo.com
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>
> >>
> >>
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> --
> Franck Brisbart
> R&D
> http://www.kelkoo.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message