lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <>
Subject Re: BooleanQuery rewrite optimization
Date Wed, 10 Aug 2016 09:52:49 GMT
I'm not awaken enough to figure out whether the -1 trick is right or not,
but if you manage to prove it somehow, patches to simplify boolean queries
at rewrite time are welcome!

Le mar. 9 août 2016 à 00:47, Spyros Kapnissis <> a
écrit :

> Hm, I hadn't really thought about the minShouldMatch part, I thought it' d
> be covered but I see your point being semantically different if you keep it
> as is.
> However.. Running your edge case example on an actual local index I get
> the following:
> "(X X Y #X)" w/minshouldmatch=2 vs. (+X X Y) w/minshouldmatch=2 => same
> top score, less results in second case."(X X Y #X)" w/minshouldmatch=2 vs.
> (+X X Y) w/minshouldmatch=1 => same top score, same number of results"(X X
> X Y #X)" w/minshouldmatch=3 vs. (+X X X Y) w/minshouldmatch=2 => same top
> score, same number of results
> But still not really convinced myself if decrementing minshouldmatch by 1
> will do the trick.. I'll have to verify - maybe I'll try more examples to
> see if it holds as a general case.. Nice exercise either way :)
>     On Tuesday, August 9, 2016 12:40 AM, Chris Hostetter <
>> wrote:
> Off the top of my head, i think any optimiation like that would also need
> to account for minNrShouldMatch, wouldn't it?
> if your query is "(X Y Z #X)" w/minshouldmatch=2, and you rewrite that
> query to "(+X Y Z)" w/minshouldmatch=2 you now have a semantically diff
> query that won't match as many documents as the original.
> in that example, you could decrement minshouldmatch (=1) ... but i'm not
> sure off that holds as a general rule for all possible permutations/values
> ... i'd have to think about it.
> An interesting edge case to think about is "(X X Y #X)" w/minshouldmatch=2
> ... pretty sure that would give you very diff scores if you rewrote it to
> "(+X X Y)" (or "(+X Y)") w/minshouldmatch=1
> : Hello all, I noticed while debugging a query that BooleanQuery will
> : rewrite itself to remove FILTER clauses that are also MUST as an
> : optimization/simplification, which makes total sense. So (+f:x #f:x)
> : will become (+f:x). However, shouldn't there also be another
> : optimization to remove FILTER clauses that are also SHOULD, while
> : converting them to MUST? So, for eg. query (f:x #f:x) will become
> : (+f:x). I did an initial simple implementation and the tests seem to
> : pass. Are there any cases where this does not hold?
> :
> :
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message