lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brisbart Franck <Franck.Brisb...@kelkoo.net>
Subject Re: score and frequency
Date Thu, 24 Jun 2004 12:32:31 GMT
It may come from the boolean clauses. If you add your sub-queries with a 
'required' flag, you'll only get the results matching all the words in 
your query.
It can also come from the score which is different. If you set up a 
threshold to return the results, it can be the problem.

Franck


Niraj Alok wrote:
> Hi Franck,
> 
> Thank you so much for the detailed explanation.
> However, when I tried to break up my MultiFieldQueryParser into a series of
> BooleanQueries, the result set has got reduced drastically.
> Any idea why this could be happening?
> 
> Regards,
> Niraj
> ----- Original Message -----
> From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Thursday, June 24, 2004 2:54 PM
> Subject: Re: score and frequency
> 
> 
> 
>>The MultiFieldQueryParser give you a BooleanQuery containing 1 query for
>>each field.
>>Something like:
>>            BooleanQuery
>>            /   |   |   \
>>          QF1  QF2 QF3  QF4    (QFx=Query for field x)
>>
>>You can still use the MultiFieldQueryParser and create a BooleanQuery to
>>encapsulate the one parsed + the PhraseQuery, ie:
>>             BooleanQuery(created by you)
>>              /       \
>>            BQ      PhraseQuery
>>
>>Or create the whole query (I think you should do that) and have
>>something like that:
>>             _BooleanQuery__
>>            /   |   |   \   \
>>          QF1  QF2 QF3  QF4  PhraseQuery      (QFx=Query for field x)
>>
>>It's like parsing the following query:
>>(field1:query) (field2:query) (field3:query)...(fieldx:query)
>>(title:"query")~boost
>>
>>
>>Franck
>>
>>
>>Niraj Alok wrote:
>>
>>>I asked the previous question since I do not know how to use PhraseQuery
>>>
>>>I have one booleanquery and one query.
>>>The query is Query query =  MultiFieldQueryParser.parse( qs, searchLoc,
>>>flags, new StandardAnalyzer(stop));
>>>
>>>where qs is the word to be searched upon and searchLoc contains all the
> 
> four
> 
>>>fields.
>>>
>>>How do I insert a PhraseQuery here for title field only, and that too
> 
> with
> 
>>>its boosted value?
>>>
>>>
>>>Regards,
>>>Niraj
>>>----- Original Message -----
>>>From: "Niraj Alok" <niraj@emacmillan.com>
>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>Sent: Thursday, June 24, 2004 2:00 PM
>>>Subject: Re: score and frequency
>>>
>>>
>>>
>>>
>>>>Does it mean that I would need to abandon MultiFieldQueryParser?
>>>>
>>>>Regards,
>>>>Niraj
>>>>----- Original Message -----
>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>Sent: Thursday, June 24, 2004 1:22 PM
>>>>Subject: Re: score and frequency
>>>>
>>>>
>>>>
>>>>
>>>>>Hi,
>>>>>first, what do you consider as an 'exact matching' ? It seems that you
>>>>>treat the search word by word, so 'lion sea' will be an 'exact match'
> 
> of
> 
>>>>>'sea-lion'.
>>>>>I think you should add a PhraseQuery to your query containing the title
>>>>>and with a big boost. So, you don't need to boost your title field.
> 
> Only
> 
>>>>>the results matching exactly (for the PhraseQuery) will be boosted.
>>>>>
>>>>>Franck
>>>>>
>>>>>
>>>>>Niraj Alok wrote:
>>>>>
>>>>>
>>>>>>Hi Guys,
>>>>>>
>>>>>>I seem to have run into rough weather again.
>>>>>>To describe the problem as concisely as possible, I have four fields
>>>
>>>to
>>>
>>>
>>>>search upon : title , first para, rest of the paras and content (equal
> 
> to
> 
>>>>title + first para + rest of the para) .  I am doing this by using
>>>>MultiFieldQueryParser.
>>>>
>>>>
>>>>>>Now there is a very complicated ranking algrorithm specified by the
>>>>
>>>>client and I have met most of them except one or two and really need
> 
> your
> 
>>>>help as all my other efforts have failed.
>>>>
>>>>
>>>>>>The most important rule is that exact matching titles should come
>>>
>>>first
>>>
>>>
>>>>, i.e. get higher scores.
>>>>
>>>>
>>>>>>I have given the highest boost factor to the title than the rest but
>>>
>>>the
>>>
>>>
>>>>problem comes up when there is some other title which has got just one
>>>
>>>word
>>>
>>>
>>>>matching. For e.g., if I search for lion, there is a title sea-lion
> 
> which
> 
>>>>also has the same boost factor as that of "lion" in the index. Also,
>>>>sea-lion has got some more "lion" in its first para or rest of the paras
>>>>etc. such that its score comes higher than "lion".
>>>>
>>>>
>>>>>>Is there some way to get the exact matching titles higher scores?
>>>>>>Please reply soon.
>>>>>>
>>>>>>
>>>>>>Regards,
>>>>>>Niraj
>>>>>>
>>>>>>
>>>>>>----- Original Message -----
>>>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>Sent: Monday, June 07, 2004 12:50 PM
>>>>>>Subject: Re: score and frequency
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>It seems that you don't the length norm to be used. It's a factor
>>>
>>>which
>>>
>>>
>>>>>>>normalize the score of a doc depending on the size of the searched
>>>
>>>field
>>>
>>>
>>>>>>>of the doc. It's the field which make that 'ground ice' has a
higher
>>>>>>>score than 'ice hockey: British Sekonda Superleague Play-Off
>>>>>>>Championship: finals' because it only has 2 terms.
>>>>>>>So, I suggest you to override the lengthNorm method and to ignore
the
>>>>>>>numTokens parameter.
>>>>>>>NB: The length norm is computed during the indexation and the
norm
> 
> are
> 
>>>>>>>store in the index (in the _aaa.f# files). So, you need to do
> 
> re-index
> 
>>>>>>>your data, and use this similarity during the indexation.
>>>>>>>
>>>>>>>Cheers,
>>>>>>>Franck
>>>>>>>
>>>>>>>
>>>>>>>Niraj Alok wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>I have set the searcher.setSimilarity  as well as also tried
setting
>>>>
>>>>the
>>>>
>>>>
>>>>>>>>coord factor to 1.
>>>>>>>>
>>>>>>>>The problem as given by an example is : Lets say I have titles
to be
>>>>>>>>displayed depending upon the search.
>>>>>>>>E.g if i have "ice hockey" as the search item and if it is
default
>>>>>>>>similarity, my results are :
>>>>>>>>
>>>>>>>>ice hockey0.99999994
>>>>>>>>ice hockey0.75
>>>>>>>>ice hockey0.75
>>>>>>>>winter Olympics: hockey, ice, medallists0.17402513
>>>>>>>>ice age0.073680125
>>>>>>>>National Hockey League0.020266924
>>>>>>>>Cracking the Ice Age0.018420031
>>>>>>>>ground-ice0.011512519
>>>>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
>>>>>>>>finals0.0069075115
>>>>>>>>(the numbers indicating the score).
>>>>>>>>
>>>>>>>>
>>>>>>>>But if i set the similarity as my overridden one, the results
> 
> become:
> 
>>>>>>>>ice hockey0.99999994
>>>>>>>>ice hockey0.75
>>>>>>>>ice hockey0.75
>>>>>>>>ice age0.22104037
>>>>>>>>winter Olympics: hockey, ice, medallists0.17402513
>>>>>>>>National Hockey League0.060800765
>>>>>>>>Cracking the Ice Age0.055260092
>>>>>>>>ground-ice0.034537554
>>>>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
>>>>>>>>finals0.020722535
>>>>>>>>
>>>>>>>>
>>>>>>>>I want all the titles which have both "ice" and "hockey" to
come
>>>
>>>above
>>>
>>>
>>>>the
>>>>
>>>>
>>>>>>>>rest (to have higher scores)
>>>>>>>>Meaning i would wish the results to appear like:
>>>>>>>>
>>>>>>>>ice hockey
>>>>>>>>ice hockey
>>>>>>>>ice hockey
>>>>>>>>winter Olympics: hockey, ice, medallists
>>>>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
> 
> finals
> 
>>>>>>>>ice age
>>>>>>>>National Hockey League
>>>>>>>>Cracking the Ice Age
>>>>>>>>ground-ice
>>>>>>>>
>>>>>>>>My overriden similarity class contains just this method:
>>>>>>>>public float coord(int overlap, int maxOverlap) {
>>>>>>>>
>>>>>>>>return 1.0f;
>>>>>>>>
>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>I feel it is the weight factor which is producing indesirable
>>>
>>>results.
>>>
>>>
>>>>Any
>>>>
>>>>
>>>>>>>>help in this regard would be highly appreciated.
>>>>>>>>
>>>>>>>>Regards,
>>>>>>>>Niraj
>>>>>>>>
>>>>>>>>----- Original Message -----
>>>>>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>>>Sent: Friday, June 04, 2004 8:46 PM
>>>>>>>>Subject: Re: score and frequency
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Hi,
>>>>>>>>>
>>>>>>>>>Be careful to set the default similarity
>>>>>>>>>'Similarity.setDefault(similarity)' before creating your
search
>>>>
>>>>instance
>>>>
>>>>
>>>>>>>>>(IndexSearcher).
>>>>>>>>>If you change the default similarity after, you'll still
use the
> 
> old
> 
>>>>one.
>>>>
>>>>
>>>>>>>>>You'd better use the 'searcher.setSimilarity' method on
your
>>>
>>>searcher.
>>>
>>>
>>>>>>>>>Franck
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Phil brunet wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>Hi to all.
>>>>>>>>>>
>>>>>>>>>>Maybe the term frequency is not the only parameter
you need to
>>>>
>>>>override
>>>>
>>>>
>>>>>>>>>>to "customize" the score attributed by Lucene.
>>>>>>>>>>
>>>>>>>>>>Maybe you should consider the normalisation factor,
the idf and
> 
> the
> 
>>>>>>>>>>coord factor ?
>>>>>>>>>>
>>>>>>>>>>Philippe
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>From: "Niraj Alok" <niraj@emacmillan.com>
>>>>>>>>>>>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>>>>>>Subject: Re: score and frequency
>>>>>>>>>>>Date: Fri, 4 Jun 2004 15:13:32 +0530
>>>>>>>>>>>
>>>>>>>>>>>Hi Erik,
>>>>>>>>>>>
>>>>>>>>>>>Thanks for the suggestion.
>>>>>>>>>>>
>>>>>>>>>>>I tried this:
>>>>>>>>>>>public class RelevanceSimilarity extends DefaultSimilarity
>>>>>>>>>>>
>>>>>>>>>>>{
>>>>>>>>>>>
>>>>>>>>>>>public float tf(float freq) {
>>>>>>>>>>>
>>>>>>>>>>>System.out.println("discounting frequency");
>>>>>>>>>>>
>>>>>>>>>>>return (float)1;
>>>>>>>>>>>
>>>>>>>>>>>}
>>>>>>>>>>>
>>>>>>>>>>>}
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>and in my query class, I used :
>>>>>>>>>>>
>>>>>>>>>>>Similarity.setDefault(similarity);
>>>>>>>>>>>
>>>>>>>>>>>Hits hits = is.search(query);
>>>>>>>>>>>
>>>>>>>>>>>for(i = 0; i < hits.length(); i ++)
>>>>>>>>>>>
>>>>>>>>>>>result = result + hits.score(i);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>However, this is still not giving me the expected
result. Do I
>>>
>>>need
>>>
>>>
>>>>to
>>>>
>>>>
>>>>>>>>do
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>something else?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Regards,
>>>>>>>>>>>Niraj
>>>>>>>>>>>
>>>>>>>>>>>----- Original Message -----
>>>>>>>>>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
>>>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>>>>>>Sent: Friday, June 04, 2004 1:55 PM
>>>>>>>>>>>Subject: Re: score and frequency
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>I am having some problems with the score
of lucene.
>>>>>>>>>>>>>I am trying to get the results displayed
according to
> 
> hits.score
> 
>>>>>>>>and
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>it is giving the results correctly.
>>>>>>>>>>>>>However I do not want the frequency factor
to be used for the
>>>>>>>>>>>>>computation of the score.
>>>>>>>>>>>>>
>>>>>>>>>>>>>Is it possible to get the score which
does not have the
>>>
>>>frequency
>>>
>>>
>>>>>>>>>>>>>factor in it ?
>>>>>>>>>>>>
>>>>>>>>>>>>Have a look at the javadocs for Similarity.
 DefaultSimilarity
> 
> is
> 
>>>>>>>>used
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>unless otherwise specified.  You could subclass
that and
> 
> override
> 
>>>>>>>>this:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>public float tf(float freq) {
>>>>>>>>>>>>  return (float)Math.sqrt(freq);
>>>>>>>>>>>>}
>>>>>>>>>>>>
>>>>>>>>>>>>and return 1.0.  This might give you the effect
you want.
>>>>>>>>>>>>
>>>>>>>>>>>>Erik
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>>>>>>>>>--------------------------------------------------------------------
> 
> -
> 
>>>>>>>>>>>>To unsubscribe, e-mail:
>>>
>>>lucene-user-unsubscribe@jakarta.apache.org
>>>
>>>
>>>>>>>>>>>>For additional commands, e-mail:
>>>>
>>>>lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>---------------------------------------------------------------------
>>>>>>>>
>>>>>>>>>>>To unsubscribe, e-mail:
> 
> lucene-user-unsubscribe@jakarta.apache.org
> 
>>>>>>>>>>>For additional commands, e-mail:
>>>
>>>lucene-user-help@jakarta.apache.org
>>>
>>>
>>>>>>>>>>_________________________________________________________________
>>>>>>>>>>Bloquez les fenĂȘtres pop-up, c'est gratuit ! http://toolbar.msn.fr
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>
>>>>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>>>>For additional commands, e-mail:
>>>
>>>lucene-user-help@jakarta.apache.org
>>>
>>>
>>>>>>>>>--
>>>>>>>>>Franck Brisbart
>>>>>>>>>R&D
>>>>>>>>>http://www.kelkoo.com
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>>>For additional commands, e-mail:
> 
> lucene-user-help@jakarta.apache.org
> 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>
>>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--
>>>>>>>Franck Brisbart
>>>>>>>R&D
>>>>>>>http://www.kelkoo.com
>>>>>>>
>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>--
>>>>>Franck Brisbart
>>>>>R&D
>>>>>http://www.kelkoo.com
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>>--
>>Franck Brisbart
>>R&D
>>http://www.kelkoo.com
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 


-- 
Franck Brisbart
R&D
http://www.kelkoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message