lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Anyway to not bother scoring less good matches ?
Date Wed, 04 May 2011 12:34:09 GMT
On 04/05/2011 12:51, Paul Taylor wrote:
> On 04/05/2011 12:39, Ahmet Arslan wrote:
>>
>> Im receiving a number of searches with many ORs so that the total 
>> number of matches is huge (>  1 million) although only the first 20 
>> results are required. Analysis shows most time is spent scoring the 
>> results. Now it seems to me if you sending a query with 10 OR 
>> components, documents that match most of the terms are bound to get a 
>> better score than a match that only matches one or two of the terms.  
>> So does lucene do any optimization to not bother working out the 
>> scores of the poor matches.
>>
>> EDIT:Actually not sure the statement because if only term matches it 
>> could still get the highest score if the match was on the shortest term.
>>
>> But can you see my point is there way to get lucene discount the less 
>> good matches without scoring them, or is there another approach. At 
>> the moment we allow the full lucene syntax and use QueryParser to 
>> parse a query and pass the resultant query to search unchanged 
>> (execpt for handling of numeric fields), should I be modifying the 
>> query somehow ?
>>
>>
>> You can restrict number of returned results by using a adaptively 
>> computed BooleanQuery.html#setMinimumNumberShouldMatch(int) parameter.
>> For example, If you have 10 optional clauses you can set minimum 
>> should match to 60% of 10 = 6.
>>
>> Similar mechanism exists in solr :
>> http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

>>
>>
>>
> Thanks for the hint, so this could be done by overriding 
> getBooleanQuery() in QueryParser ?
>
> Paul
>
Well I did extend QuerParser, and the method is being called but rather 
disappointingly it had no noticeablke effect on how long queries took. I 
really thought by reducing the number of matches the corresponding 
scoring phase would be quicker.

@Override
     protected Query getBooleanQuery(List<BooleanClause> clauses, 
boolean disableCoord)
         throws ParseException
     {
         BooleanQuery query = (BooleanQuery) 
super.getBooleanQuery(clauses,disableCoord);
         if(query!=null)
         {
             if(clauses.size() > 5)
             {
                 query.setMinimumNumberShouldMatch(3);
             }
         }
         return query;
     }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message