lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Willson <>
Subject Re: Long query optimisation: using some terms for scoring only
Date Tue, 11 Dec 2012 18:13:33 GMT
Hi lukai

That sounds like a nice optimisation, perhaps more sophisticated than 
the "AND_MAYBE" support I was looking for but a similar idea. Is the 
code available anywhere?


On 11/12/12 17:45, lukai wrote:
> I had implemented WAND in solr for our own project. It can improve the
> performance a lot. For your reference:
> But it needs to change index a little bit.
> Thanks,
> On Tue, Dec 11, 2012 at 6:19 AM, Matthew Willson <>wrote:
>> Hi all
>> I'm currently benchmarking Lucene to get an understanding of what
>> optimisations are available for long queries, and wanted to check what the
>> recommended approach is.
>> Unsurprisingly a naive approach to long queries (just keep adding SHOULD
>> clauses to a big BooleanQuery) scales close to linearly in the number of
>> terms, which beyond a certain point isn't good enough.
>> The obvious solution is to prune the query in order to reduce the number
>> of documents which need scoring, and this is easy to do, but has the
>> downside that none of the pruned terms are used for scoring.
>> In Xapian there's a handy query operator called OP_AND_MAYBE, where only
>> terms on the left-hand-side are used to select documents, with terms on the
>> right-hand-side used for scoring only. This performs much better for long
>> queries if less discriminative terms are moved onto the right-hand-side.
>> I tried to replicate this approach in Lucene using the following query (in
>> QueryParser syntax):
>> +(some mandatory terms) and some other terms for scoring only
>> The presence of a MUST clause in the outer BooleanQuery forces the
>> remaining SHOULD clauses to be purely optional and not expand the set of
>> documents scored, so this has the right semantics. However the performance
>> benefit isn't there -- in a test with 200 query terms in total, it quickly
>> becomes slower than a plain flat BooleanQuery once the number of terms in
>> the mandatory part of the query exceeds 5 or so.
>> Interestingly it's much much faster (~40ms) when there's only one
>> mandatory term, than when there are two terms in the mandatory clause
>> (~2500ms), which leads me to suspect an obvious optimisation is being
>> missed.
>> Anyone have any ideas on this, pointers to other relevant query types or
>> optimisations available in Lucene 4, or on which parts of the
>> Query/Weight/Scorer code we'd need to change to speed up this kind of thing?
>> Cheers
>> -Matt
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**<>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<>

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message