lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-7434) Add minNumberShouldMatch parameter to SpanNearQuery
Date Fri, 02 Sep 2016 17:00:24 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459041#comment-15459041
] 

Paul Elschot edited comment on LUCENE-7434 at 9/2/16 5:00 PM:
--------------------------------------------------------------

bq.  Is my proposed approach flawed for the minNumberShouldMatch component ... ?

Looking at the code on github here https://github.com/apache/lucene-solr/pull/75/commits/c37f1e0d66f1f28a5c83033d9496cc33c55f265e
it uses NearSpansOrdered and NearSpansUnOrdered with all subSpans, as usual, see lines 277/278.

I think that is too strict when more than the required number of subSpans are actually present
in the segment.
The check for presence of subSpans should be at document level, and even then fewer than present
might match for the given slop/window.

The (untested) all pairs code above tries to do that, but only for pairs of subSpans.



was (Author: paul.elschot@xs4all.nl):
bq.  Is my proposed approach flawed for the minNumberShouldMatch component ... ?

Looking at the code on github here https://github.com/apache/lucene-solr/pull/75/commits/c37f1e0d66f1f28a5c83033d9496cc33c55f265e
it uses NearSpansOrdered and NearSpansUnOrdered with all subSpans, as usual, see lines 277/278.

I think that is too strict in when more than the required number of subSpans are actually
present in the segment.
The check for presence of subSpans should be at document level, and even then fewer than present
might match for the given slop/window.

The (untested) all pairs code above tries to do that, but only for pairs of subSpans.


> Add minNumberShouldMatch parameter to SpanNearQuery
> ---------------------------------------------------
>
>                 Key: LUCENE-7434
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7434
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: AllPairsNearSpans20160902.patch
>
>
> On the user list, [~saar32] asked about a new type of SpanQuery that would allow for
something like BooleanQuery's minimumNumberShouldMatch
> bq. Given a set of search terms (t1, t2, t3, ti), return all documents where in a sequence
of x=10 tokens at least c=3 of the search terms appear within the sequence.
> I _think_ we can modify SpanNearQuery fairly easily to accommodate this.  I'll submit
a PR in the next few days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message