lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: How is Number of Boolean Clauses calculated - Minimum Should Match?
Date Mon, 10 Oct 2011 18:18:43 GMT

: From my understanding this could be also dangerous for queries that
: reduce the number of tokens.
: Imagine: Search Engine => SE (reduced to SE).
: This should have the same impact on the min should match as a stopword, no?

Not really ... assuming you mean *query* based synonyms, then a multiword 
synonym used in the query string isn't going to be respected unless it's 
explicilty quoted, because each "chunk" of query parser input is analyzed 
independently.  (remember: the QueryParser parses according to it's own 
meta-characters -- including whitespace -- before passing any parts of hte 
input to the individual analyzers)

Even if it is quoted, and it reduces to one term in fieldA, but remains 
two terms in fieldB, the number of clauses isn't affected because the end 
result for each chunk is what's used to create the DisjunctionMaxQuery 
objects that are used as the clauses in the top level BooleanQuery.

: What if I remove a stopword but add another token when synonyms come in?

try it ... you'll see what i mean.

(when it comes to query parsing, no amount of textual description can 
substitue fo first hand experience and experimentation -- i've written 
documenation, blogs, emails ... i've even done training classes where i've 
discussed this specific thing for ~1 hour -- nothing makes it hit home 
like having people sit down and actually play with the config and see the 
output)



-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message