lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Anyway to not bother scoring less good matches ?
Date Wed, 04 May 2011 11:51:57 GMT
On 04/05/2011 12:39, Ahmet Arslan wrote:
>
> Im receiving a number of searches with many ORs so that the total number of matches is
huge (>  1 million) although only the first 20 results are required. Analysis shows most
time is spent scoring the results. Now it seems to me if you sending a query with 10 OR components,
documents that match most of the terms are bound to get a better score than a match that only
matches one or two of the terms.  So does lucene do any optimization to not bother working
out the scores of the poor matches.
>
> EDIT:Actually not sure the statement because if only term matches it could still get
the highest score if the match was on the shortest term.
>
> But can you see my point is there way to get lucene discount the less good matches without
scoring them, or is there another approach. At the moment we allow the full lucene syntax
and use QueryParser to parse a query and pass the resultant query to search unchanged (execpt
for handling of numeric fields), should I be modifying the query somehow ?
>
>
> You can restrict number of returned results by using a adaptively computed BooleanQuery.html#setMinimumNumberShouldMatch(int)
parameter.
> For example, If you have 10 optional clauses you can set minimum should match to 60%
of 10 = 6.
>
> Similar mechanism exists in solr :
> http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
>
>
Thanks for the hint, so this could be done by overriding 
getBooleanQuery() in QueryParser ?

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message