lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: combine query score with external score
Date Mon, 01 Feb 2010 09:42:46 GMT
Have you considered the function query stuff?
oal.search.function.CustomScoreQuery and friends.
If you provide your own CustomScoreQuery implementation you can do
scoring however you like.


--
Ian.



On Mon, Feb 1, 2010 at 7:08 AM, Dennis Hendriksen
<dennis.hendriksen@kalooga.com> wrote:
> Hi Steve,
>
> Thank you for your suggestions. Payloads might indeed help me to
> overcome the precision loss problem that I am experiencing right now. I
> don't think it will help me with the combining of Lucene scores with
> external scores however.
>
> Is there anyone who has a suggestion how to deal with that?
>
> Dennis
>
> On Thu, 2010-01-28 at 13:52 -0500, Steven A Rowe wrote:
>> Hi Dennis,
>>
>> You should check out payloads (arbitrary per-index-term byte[] arrays), which can
be used to encode values which are then incorporated into documents' scores, by overriding
Similarity.scorePayload():
>>
>> <http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Similarity.html#scorePayload%28int,%20java.lang.String,%20int,%20int,%20byte[],%20int,%20int%29>
>>
>> The Lucene in Action 2 MEAP has a nice introduction to using payloads to influence
scoring, in section 6.5.
>>
>> See also this (slightly out-of-date*) blog post "Getting Started with Payloads" by
Grant Ingersoll at Lucid Imagination:
>>
>> <http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/>
>>
>> *Note that since this blog post was written, BoostingTermQuery was renamed to PayloadTermQuery
(in Lucene 2.9.0+ ; see http://issues.apache.org/jira/browse/LUCENE-1827 ; wow - this issue
isn't mentioned in CHANGES.txt???):
>>
>> <http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/payloads/PayloadTermQuery.html>
>>
>> Steve
>>
>> On 01/28/2010 at 6:01 AM, Dennis Hendriksen wrote:
>> > I'm struggling to create a performant query in Lucene 3.0.0 in which I
>> > want to combine 'regular' scoring with scores derived from external
>> > sources.
>> >
>> > For each document a fixed set of scores is calculated in the range [0.0,
>> > 1.0>. These scores represent the confidences that a document falls into
>> > categories. So for example document #1 has a score of 0.3 for cat=boys,
>> > 0.2 for cat=girls, 0.1 for cat=toys, 0.05 for cat=animals.
>> >
>> > The 'regular' scoring is calculated using a BooleanQuery with TermQuerys
>> > similar to: -type:H +(title:dna body:dna^1.5)
>> >
>> > In the current naive approach I'm combining the scores as following: -
>> > for each document store the three best categories in the following
>> > fields:
>> > name=cat1st value=boys fieldboost=0.3
>> > name=cat2nd value=girls fieldboost=0.2
>> > name=cat3rd value=toys fieldboost=0.1
>> > Search-time use the following query if you're interested in 'girls':
>> > -type:H +(title:dna body:dna^1.5) cat1st:girls cat2nd:girls cat3rd:girls
>> > or if you're interested in 'boys':
>> > -type:H +(title:dna body:dna^1.5) cat1st:boys cat2nd:boys cat3rd:boys
>> >
>> > Disadvantages of the current approach:
>> > - loss of precision encoding/decoding boosts (performance is important,
>> > so this might be acceptable)
>> > - using TermQuery for the cat fields doesn't make a lot of sense since
>> > the external scores are multiplied by the idf of 'boys'/'girls' and
>> > the querynorm
>> > - the resulting score from the cat field is added to the other query
>> > score instead of multiplied
>> >
>> > Just to give you an idea: the index I'm using is growing in time and
>> > contains about 50 million documents
>> >
>> > Do you have an idea how I can improve my query and still keep high
>> > performance? Or should I combine the scores in the Collector (but this
>> > doesn't seem the right place to retrieve the category scores from the
>> > index)? Is it possible to use a different float->byte encoder per field
>> > to reduce the lack of precision?
>> >
>> > Thanks for your time,
>> > Dennis
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message