lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Spyros Kapnissis <>
Subject Re: BooleanQuery rewrite optimization
Date Mon, 08 Aug 2016 22:45:52 GMT
Hm, I hadn't really thought about the minShouldMatch part, I thought it' d be covered but I
see your point being semantically different if you keep it as is.
However.. Running your edge case example on an actual local index I get the following:
"(X X Y #X)" w/minshouldmatch=2 vs. (+X X Y) w/minshouldmatch=2 => same top score, less
results in second case."(X X Y #X)" w/minshouldmatch=2 vs. (+X X Y) w/minshouldmatch=1 =>
same top score, same number of results"(X X X Y #X)" w/minshouldmatch=3 vs. (+X X X Y) w/minshouldmatch=2
=> same top score, same number of results
But still not really convinced myself if decrementing minshouldmatch by 1 will do the trick..
I'll have to verify - maybe I'll try more examples to see if it holds as a general case..
Nice exercise either way :)


    On Tuesday, August 9, 2016 12:40 AM, Chris Hostetter <>

Off the top of my head, i think any optimiation like that would also need 
to account for minNrShouldMatch, wouldn't it?

if your query is "(X Y Z #X)" w/minshouldmatch=2, and you rewrite that 
query to "(+X Y Z)" w/minshouldmatch=2 you now have a semantically diff 
query that won't match as many documents as the original.

in that example, you could decrement minshouldmatch (=1) ... but i'm not 
sure off that holds as a general rule for all possible permutations/values 
... i'd have to think about it.

An interesting edge case to think about is "(X X Y #X)" w/minshouldmatch=2 
... pretty sure that would give you very diff scores if you rewrote it to 
"(+X X Y)" (or "(+X Y)") w/minshouldmatch=1

: Hello all, I noticed while debugging a query that BooleanQuery will 
: rewrite itself to remove FILTER clauses that are also MUST as an 
: optimization/simplification, which makes total sense. So (+f:x #f:x) 
: will become (+f:x). However, shouldn't there also be another 
: optimization to remove FILTER clauses that are also SHOULD, while 
: converting them to MUST? So, for eg. query (f:x #f:x) will become 
: (+f:x). I did an initial simple implementation and the tests seem to 
: pass. Are there any cases where this does not hold? 


To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message