lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Extending query parser with MinShouldMatch syntax
Date Tue, 16 Sep 2008 20:47:00 GMT

: Suppose that I propose a file-type filter to the user, and the user typed
: some keywords, like "hello world". The user gets back results, and he now
: wants to filter those results by select "PDF" from the file-type filter. The
: only query the client application can send to the back-end is "hello world
: +filetype:pdf". But that doesn't work as expected. If queries are run with
: OR operator as the default, then the documents that will be returned are
: those that include filetype:pdf, and may or may not include "hello world".
: This is not what the user expected though.

I'm really not understanding what that example has to do with 
minShouldMatch ... the fundemental problem in your example is that if you 
start with a query for...
	"hello world"
...and then want to restrict it to only docs that also match...
...the combined query must have *both* clauses marekd as mandatory...
	+"hello world" +filetype:pdf

minShouldMatch doesn't even factor in at all.

Independent of that, if you wnat ot add minShouldMatch support to 
QueryParser, there are two fairly straightforward ways to go, depending on 
how generalized you wnat support to be...

1) minShouldMatch set on all BooleanQueries (as a function of length)  

This is hte appraoch the DisMaxQueryParser in Solr takes ... you override 
the getBooleanQuery method in QueryParser, delegate to super, and then 
modify the BooleanQuery returned setting minShouldMatch based on some 
function of the number of clauses it already contains.  the version in 
Solr supports a gramer for deciding what it should be relative various 
cut-off points as either an absolute number or a percentage...

2) overload the use of "~" in the parser grammer

instead of adding a new special character to the grammer (i think you 
suggested '#') which cuold break back compatibility you might want to 
consider modifying the grammer to recognize the '~' character when it 
follows a close paren as an indication of minShouldMatch on the boolean 
query those parens wrap.  Since '~' is currently used for specifying 
slop on phrase queries and fuzzyniess on fuzzy queries it's already a 
reserved character.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message