lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7434) Add minNumberShouldMatch parameter to SpanNearQuery
Date Fri, 02 Sep 2016 16:59:21 GMT


Paul Elschot commented on LUCENE-7434:

bq.  Is my proposed approach flawed for the minNumberShouldMatch component ... ?

Looking at the code on github here
it uses NearSpansOrdered and NearSpansUnOrdered with all subSpans, as usual, see lines 277/278.

I think that is too strict in when more than the required number of subSpans are actually
present in the segment.
The check for presence of subSpans should be at document level, and even then fewer than present
might match for the given slop/window.

The (untested) all pairs code above tries to do that, but only for pairs of subSpans.

> Add minNumberShouldMatch parameter to SpanNearQuery
> ---------------------------------------------------
>                 Key: LUCENE-7434
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: AllPairsNearSpans20160902.patch
> On the user list, [~saar32] asked about a new type of SpanQuery that would allow for
something like BooleanQuery's minimumNumberShouldMatch
> bq. Given a set of search terms (t1, t2, t3, ti), return all documents where in a sequence
of x=10 tokens at least c=3 of the search terms appear within the sequence.
> I _think_ we can modify SpanNearQuery fairly easily to accommodate this.  I'll submit
a PR in the next few days.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message