lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brisbart Franck <Franck.Brisb...@kelkoo.net>
Subject Re: score and frequency
Date Thu, 24 Jun 2004 09:24:36 GMT
The MultiFieldQueryParser give you a BooleanQuery containing 1 query for 
each field.
Something like:
            BooleanQuery
            /   |   |   \
          QF1  QF2 QF3  QF4    (QFx=Query for field x)

You can still use the MultiFieldQueryParser and create a BooleanQuery to 
encapsulate the one parsed + the PhraseQuery, ie:
             BooleanQuery(created by you)
              /       \
            BQ      PhraseQuery

Or create the whole query (I think you should do that) and have 
something like that:
             _BooleanQuery__
            /   |   |   \   \
          QF1  QF2 QF3  QF4  PhraseQuery      (QFx=Query for field x)

It's like parsing the following query:
(field1:query) (field2:query) (field3:query)...(fieldx:query) 
(title:"query")~boost


Franck


Niraj Alok wrote:
> I asked the previous question since I do not know how to use PhraseQuery
> 
> I have one booleanquery and one query.
> The query is Query query =  MultiFieldQueryParser.parse( qs, searchLoc,
> flags, new StandardAnalyzer(stop));
> 
> where qs is the word to be searched upon and searchLoc contains all the four
> fields.
> 
> How do I insert a PhraseQuery here for title field only, and that too with
> its boosted value?
> 
> 
> Regards,
> Niraj
> ----- Original Message -----
> From: "Niraj Alok" <niraj@emacmillan.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Thursday, June 24, 2004 2:00 PM
> Subject: Re: score and frequency
> 
> 
> 
>>Does it mean that I would need to abandon MultiFieldQueryParser?
>>
>>Regards,
>>Niraj
>>----- Original Message -----
>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>Sent: Thursday, June 24, 2004 1:22 PM
>>Subject: Re: score and frequency
>>
>>
>>
>>>Hi,
>>>first, what do you consider as an 'exact matching' ? It seems that you
>>>treat the search word by word, so 'lion sea' will be an 'exact match' of
>>>'sea-lion'.
>>>I think you should add a PhraseQuery to your query containing the title
>>>and with a big boost. So, you don't need to boost your title field. Only
>>>the results matching exactly (for the PhraseQuery) will be boosted.
>>>
>>>Franck
>>>
>>>
>>>Niraj Alok wrote:
>>>
>>>>Hi Guys,
>>>>
>>>>I seem to have run into rough weather again.
>>>>To describe the problem as concisely as possible, I have four fields
> 
> to
> 
>>search upon : title , first para, rest of the paras and content (equal to
>>title + first para + rest of the para) .  I am doing this by using
>>MultiFieldQueryParser.
>>
>>>>Now there is a very complicated ranking algrorithm specified by the
>>
>>client and I have met most of them except one or two and really need your
>>help as all my other efforts have failed.
>>
>>>>The most important rule is that exact matching titles should come
> 
> first
> 
>>, i.e. get higher scores.
>>
>>>>I have given the highest boost factor to the title than the rest but
> 
> the
> 
>>problem comes up when there is some other title which has got just one
> 
> word
> 
>>matching. For e.g., if I search for lion, there is a title sea-lion which
>>also has the same boost factor as that of "lion" in the index. Also,
>>sea-lion has got some more "lion" in its first para or rest of the paras
>>etc. such that its score comes higher than "lion".
>>
>>>>Is there some way to get the exact matching titles higher scores?
>>>>Please reply soon.
>>>>
>>>>
>>>>Regards,
>>>>Niraj
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>Sent: Monday, June 07, 2004 12:50 PM
>>>>Subject: Re: score and frequency
>>>>
>>>>
>>>>
>>>>
>>>>>It seems that you don't the length norm to be used. It's a factor
> 
> which
> 
>>>>>normalize the score of a doc depending on the size of the searched
> 
> field
> 
>>>>>of the doc. It's the field which make that 'ground ice' has a higher
>>>>>score than 'ice hockey: British Sekonda Superleague Play-Off
>>>>>Championship: finals' because it only has 2 terms.
>>>>>So, I suggest you to override the lengthNorm method and to ignore the
>>>>>numTokens parameter.
>>>>>NB: The length norm is computed during the indexation and the norm are
>>>>>store in the index (in the _aaa.f# files). So, you need to do re-index
>>>>>your data, and use this similarity during the indexation.
>>>>>
>>>>>Cheers,
>>>>>Franck
>>>>>
>>>>>
>>>>>Niraj Alok wrote:
>>>>>
>>>>>
>>>>>>I have set the searcher.setSimilarity  as well as also tried setting
>>
>>the
>>
>>>>>>coord factor to 1.
>>>>>>
>>>>>>The problem as given by an example is : Lets say I have titles to
be
>>>>>>displayed depending upon the search.
>>>>>>E.g if i have "ice hockey" as the search item and if it is default
>>>>>>similarity, my results are :
>>>>>>
>>>>>>ice hockey0.99999994
>>>>>>ice hockey0.75
>>>>>>ice hockey0.75
>>>>>>winter Olympics: hockey, ice, medallists0.17402513
>>>>>>ice age0.073680125
>>>>>>National Hockey League0.020266924
>>>>>>Cracking the Ice Age0.018420031
>>>>>>ground-ice0.011512519
>>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
>>>>>>finals0.0069075115
>>>>>>(the numbers indicating the score).
>>>>>>
>>>>>>
>>>>>>But if i set the similarity as my overridden one, the results become:
>>>>>>ice hockey0.99999994
>>>>>>ice hockey0.75
>>>>>>ice hockey0.75
>>>>>>ice age0.22104037
>>>>>>winter Olympics: hockey, ice, medallists0.17402513
>>>>>>National Hockey League0.060800765
>>>>>>Cracking the Ice Age0.055260092
>>>>>>ground-ice0.034537554
>>>>>>ice hockey: British Sekonda Superleague Play-Off Championship:
>>>>>>finals0.020722535
>>>>>>
>>>>>>
>>>>>>I want all the titles which have both "ice" and "hockey" to come
> 
> above
> 
>>the
>>
>>>>>>rest (to have higher scores)
>>>>>>Meaning i would wish the results to appear like:
>>>>>>
>>>>>>ice hockey
>>>>>>ice hockey
>>>>>>ice hockey
>>>>>>winter Olympics: hockey, ice, medallists
>>>>>>ice hockey: British Sekonda Superleague Play-Off Championship: finals
>>>>>>ice age
>>>>>>National Hockey League
>>>>>>Cracking the Ice Age
>>>>>>ground-ice
>>>>>>
>>>>>>My overriden similarity class contains just this method:
>>>>>>public float coord(int overlap, int maxOverlap) {
>>>>>>
>>>>>>return 1.0f;
>>>>>>
>>>>>>}
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>I feel it is the weight factor which is producing indesirable
> 
> results.
> 
>>Any
>>
>>>>>>help in this regard would be highly appreciated.
>>>>>>
>>>>>>Regards,
>>>>>>Niraj
>>>>>>
>>>>>>----- Original Message -----
>>>>>>From: "Brisbart Franck" <Franck.Brisbart@kelkoo.net>
>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>Sent: Friday, June 04, 2004 8:46 PM
>>>>>>Subject: Re: score and frequency
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>Be careful to set the default similarity
>>>>>>>'Similarity.setDefault(similarity)' before creating your search
>>
>>instance
>>
>>>>>>>(IndexSearcher).
>>>>>>>If you change the default similarity after, you'll still use the
old
>>
>>one.
>>
>>>>>>>You'd better use the 'searcher.setSimilarity' method on your
> 
> searcher.
> 
>>>>>>>Franck
>>>>>>>
>>>>>>>
>>>>>>>Phil brunet wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Hi to all.
>>>>>>>>
>>>>>>>>Maybe the term frequency is not the only parameter you need
to
>>
>>override
>>
>>>>>>>>to "customize" the score attributed by Lucene.
>>>>>>>>
>>>>>>>>Maybe you should consider the normalisation factor, the idf
and the
>>>>>>>>coord factor ?
>>>>>>>>
>>>>>>>>Philippe
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>From: "Niraj Alok" <niraj@emacmillan.com>
>>>>>>>>>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>>>>Subject: Re: score and frequency
>>>>>>>>>Date: Fri, 4 Jun 2004 15:13:32 +0530
>>>>>>>>>
>>>>>>>>>Hi Erik,
>>>>>>>>>
>>>>>>>>>Thanks for the suggestion.
>>>>>>>>>
>>>>>>>>>I tried this:
>>>>>>>>>public class RelevanceSimilarity extends DefaultSimilarity
>>>>>>>>>
>>>>>>>>>{
>>>>>>>>>
>>>>>>>>>public float tf(float freq) {
>>>>>>>>>
>>>>>>>>>System.out.println("discounting frequency");
>>>>>>>>>
>>>>>>>>>return (float)1;
>>>>>>>>>
>>>>>>>>>}
>>>>>>>>>
>>>>>>>>>}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>and in my query class, I used :
>>>>>>>>>
>>>>>>>>>Similarity.setDefault(similarity);
>>>>>>>>>
>>>>>>>>>Hits hits = is.search(query);
>>>>>>>>>
>>>>>>>>>for(i = 0; i < hits.length(); i ++)
>>>>>>>>>
>>>>>>>>>result = result + hits.score(i);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>However, this is still not giving me the expected result.
Do I
> 
> need
> 
>>to
>>
>>>>>>do
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>something else?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Regards,
>>>>>>>>>Niraj
>>>>>>>>>
>>>>>>>>>----- Original Message -----
>>>>>>>>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
>>>>>>>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>>>>Sent: Friday, June 04, 2004 1:55 PM
>>>>>>>>>Subject: Re: score and frequency
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>
>>>>>>>>>>>I am having some problems with the score of lucene.
>>>>>>>>>>>I am trying to get the results displayed according
to hits.score
>>>>>>
>>>>>>and
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>>>it is giving the results correctly.
>>>>>>>>>>>However I do not want the frequency factor to
be used for the
>>>>>>>>>>>computation of the score.
>>>>>>>>>>>
>>>>>>>>>>>Is it possible to get the score which does not
have the
> 
> frequency
> 
>>>>>>>>>>>factor in it ?
>>>>>>>>>>
>>>>>>>>>>Have a look at the javadocs for Similarity.  DefaultSimilarity
is
>>>>>>
>>>>>>used
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>>unless otherwise specified.  You could subclass that
and override
>>>>>>
>>>>>>this:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>> public float tf(float freq) {
>>>>>>>>>>   return (float)Math.sqrt(freq);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>and return 1.0.  This might give you the effect you
want.
>>>>>>>>>>
>>>>>>>>>>Erik
>>>>>>>>>>
>>>>>>>>>>
>>>
>>>>>>>>---------------------------------------------------------------------
>>>>>>>>
>>>>>>>>>>To unsubscribe, e-mail:
> 
> lucene-user-unsubscribe@jakarta.apache.org
> 
>>>>>>>>>>For additional commands, e-mail:
>>
>>lucene-user-help@jakarta.apache.org
>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>
>>>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>>>For additional commands, e-mail:
> 
> lucene-user-help@jakarta.apache.org
> 
>>>>>>>>_________________________________________________________________
>>>>>>>>Bloquez les fenĂȘtres pop-up, c'est gratuit ! http://toolbar.msn.fr
>>>>>>>>
>>>>>>>>
>>>
>>>>>>---------------------------------------------------------------------
>>>>>>
>>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>>For additional commands, e-mail:
> 
> lucene-user-help@jakarta.apache.org
> 
>>>>>>>
>>>>>>>--
>>>>>>>Franck Brisbart
>>>>>>>R&D
>>>>>>>http://www.kelkoo.com
>>>>>>>
>>>>>>>
>>>
>>>>>---------------------------------------------------------------------
>>>>>
>>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>Franck Brisbart
>>>>>R&D
>>>>>http://www.kelkoo.com
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>
>>>--
>>>Franck Brisbart
>>>R&D
>>>http://www.kelkoo.com
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


-- 
Franck Brisbart
R&D
http://www.kelkoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message