lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Minimum Should Match the other way round
Date Thu, 22 Apr 2010 01:15:56 GMT

: However, maybe I missunderstood your point:
: "- Pick MAX_LEN Based On Number Of Query Clauses From Super" 
: since I thought, that the number of query clauses depends on the number of
: whitespaces in my query. If I am wrong, and it depends on the result of my
: analyzer-chain, there is no problem. But I am not sure, if this is the case
: or not.

In a typcial situation asking the Lucene QueryParser to parse something 
like...

	how now 3G Cow

...is going to produce a BooleanQuery consisting of 4 BooleanClauses.  it 
won't matter if you have word delimiter filter setup to split that "3G" 
into "3","G" -- the QueryParse will use those to construct a PhraseQuery 
which will be one of those 4 BooleanClauses.

Likewise, something like this...

	how (now 3G cow)

...will be parsed by the Lucene QueryParser into a BooleanQuery with only 
two clauses -- the second clause will be a BooleanQuery consisting of 3 
clauses (and assuming WDF: the second of *those* clauses will be a 
PhraseQuery consisting of two terms)

In your situation, you have a few options -- picking the right one depends 
on what the specific behavior you want is:

1) you could "walk" the Query structure, and count the number of actual 
terms involved and use that to pick your MAC_LEN

2) you could use a different QParser (besides the LuceneQParser) that has 
simpler behavior -- for example the FieldQParser will either produce a 
simple TermQuery or a simple PhraseQuery -- making it very easy to count 
hte terms.

3) you could subclass a QParser with complex rules, but then make your own 
pass at parsing hte data to compute the MAX_LEN param (ie: subclass 
LuceneQParser, but add a length filter based on what FieldQParser says)


When I've seen this sort of thing done in hte past, the idea is to use an 
extremely simple set of rules for hte "length" restriction -- even if you 
are using WDF on the "title" field to increase the number of things that 
are matched on, that doens't mean you have to use it on the "title_length" 
field.



-Hoss


Mime
View raw message