lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Question about how to speed up custom scoring
Date Fri, 09 Oct 2009 21:00:23 GMT
Oops, just reread and realized you wanted query time weights.   
Payloads are an index time thing.

On Oct 9, 2009, at 5:49 PM, Grant Ingersoll wrote:

> If you are trying to add specific term weights to terms in the index  
> and then incorporate them into scoring, you might benefit from  
> payloads and the PayloadTermQuery option.  See http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
>
> -Grant
>
> On Oct 8, 2009, at 11:56 AM, scott w wrote:
>
>> Oops, forgot to include the class I mentioned. Here it is:
>>
>> public class QueryTermBoostingQuery extends CustomScoreQuery {
>> private Map<String, Float> queryTermWeights;
>> private float bias;
>> private IndexReader indexReader;
>>
>> public QueryTermBoostingQuery( Query q, Map<String, Float>  
>> termWeights,
>> IndexReader indexReader, float bias) {
>>   super( q );
>>   this.indexReader = indexReader;
>>   if (bias < 0 || bias > 1) {
>>     throw new IllegalArgumentException( "Bias must be between 0 and  
>> 1" );
>>   }
>>   this.bias = bias;
>>   queryTermWeights = termWeights;
>> }
>>
>> @Override
>> public float customScore( int doc, float subQueryScore, float  
>> valSrcScore
>> ) {
>>   Document document;
>>   try {
>>     document = indexReader.document( doc );
>>   } catch (IOException e) {
>>     throw new SearchException( e );
>>   }
>>   float termWeightedScore = 0;
>>
>>   for (String field : queryTermWeights.keySet()) {
>>     String docFieldValue = document.get( field );
>>     if (docFieldValue != null) {
>>       Float weight = queryTermWeights.get( field );
>>       if (weight != null) {
>>         termWeightedScore += weight * Float.parseFloat 
>> ( docFieldValue );
>>       }
>>     }
>>   }
>>   return bias * subQueryScore + (1 - bias) * termWeightedScore;
>> }
>> }
>>
>> On Thu, Oct 8, 2009 at 7:54 AM, scott w <scottblanc@gmail.com> wrote:
>>
>>> I am trying to come up with a performant query that will allow me  
>>> to use a
>>> custom score where the custom score is a sum-product over a set of  
>>> query
>>> time weights where each weight gets applied only if the query time  
>>> term
>>> exists in the document . So for example if I have a doc with three  
>>> fields:
>>> company=Microsoft, city=Redmond, and size=large, I may want to  
>>> score that
>>> document according to the following function: city==Microsoft ? . 
>>> 3 : 0 *
>>> size ==large ? 0.5 : 0 to get a score of 0.8. Attached is a  
>>> subclass I have
>>> tested that implements this with one extra component which is that  
>>> it allow
>>> the relevance score to be combined in.
>>>
>>> The problem is this custom score is not performant at all. For  
>>> example, on
>>> a small index of 5 million documents with 10 weights passed in it  
>>> does 0.01
>>> req/sec.
>>>
>>> Are there ways to make to compute the same custom score but in a  
>>> much more
>>> performant way?
>>>
>>> thanks,
>>> Scott
>>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message