As I understood it setMinimumNumberShouldMatch(int min) Is used to
specify a minimum number of the optional BooleanClauses which must be
satisfied.
I haven't seen the implementation of setMinimumNumberShouldMatch but
it seems a bit different than what is intended with the WAND operator,
which can take any real number as threshold θ
As stated in the paper:
WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ
where xi is the indicator variable for Xi, that is xi = 1, if Xi is
true 0, otherwise.
Observe that WAND can be used to implement AND
and OR via
AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k),
and
OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1).
What I find interesting is the idea of using a first pass using the
upper bound (maximal) contribution of a term on any document score and
the dynamic setting of the threshold θ to skip or to fully evaluate a
document..
As stated in the paper:
"Given this setup our preliminary scoring consists of evaluating
for each document d
WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ),
where Xi is an indicator variable for the presence of query term i in
document d and the threshold θ is varied during
the algorithm as explained below. If WAND evaluates to true, then the
document d undergoes a full evaluation.
The threshold θ is set dynamically by the algorithm based on the
minimum score m among the top n results found so
far, where n is the number of requested documents. The larger the
threshold, the more documents will be skipped
and thus we will need to compute full scores for fewer documents."
I think its worth a try...
 Joaquin
On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <ab@getopt.org> wrote:
>
> J. Delgado wrote:
>>
>> Here is the link to the paper.
>> http://cis.poly.edu/westlab/papers/cntdstrb/p426broder.pdf
>>
>> A more recent application of the use and extension of the WAND operator for
>> indexing of Boolean expressions:
>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
>>
>>  Joaquin
>>
>>
>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
>> shalinmangar@gmail.com> wrote:
>>
>>> Hey Joaquin,
>>>
>>> The mailing list strips off attachments. Can you please upload it somewhere
>>> and give us the link?
>>>
>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
>>>>
>>>> wrote:
>>>> Please find attached the paper on "Efficient Query Evaluation using a
>>>> TwoLevel Retrieval Process". I believe that such approach may improve
>>>
>>> the
>>>>
>>>> way Lucene/Solr evaluates queries today.
>
> The functionality of WAND (weak AND) is already implemented in Lucene, if I understand
it correctly  this is the BooleanQuery.setMinShouldMatch(int). Lucene implements this probably
differently from the algorithm described in the paper, so there may be still some benefits
from comparing the algorithms in Lucene's BooleanScorer[2] with this one ...
>
>
> 
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__  ____/__\/ Information Retrieval, Semantic Web
> _____ \   Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>

To unsubscribe, email: javadevunsubscribe@lucene.apache.org
For additional commands, email: javadevhelp@lucene.apache.org
