lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true
Date Thu, 19 May 2011 22:29:47 GMT


Hoss Man commented on LUCENE-3120:

comment i made on the mailing list regarding this topic...

the crux of hte issue (as i recall) is that there is really no conecptual reason to why a
query for "'john' near 'john', in any order, with slop of Z" shouldn't match a doc that contains
only one instance of "john" ... the first SpanTermQuery says "i found a match at position
X" the second SpanTermQuery says "i found a match at position Y" and the SpanNearQuery says
"the differnece between X and Y is less then Z" therefore i have a match. (The SpanNearQuery
can't fail just because X and Y are the same -- they might be two distinct term instances,
with differnet payloads perhaps, that just happen to have the same position).

However: if true==inOrder case works because the SpanNearQuery enforces that "X must be less
then Y" so the same term can't ever match twice. 

> span query matches too many docs when two query terms are the same unless inOrder=true
> --------------------------------------------------------------------------------------
>                 Key: LUCENE-3120
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>            Reporter: Doron Cohen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>         Attachments: LUCENE-3120.patch, LUCENE-3120.patch
> spinoff of user list discussion - [SpanNearQuery - inOrder parameter|].
> With 3 documents:
> *  "a b x c d"
> *  "a b b d"
> *  "a b x b y d"
> Here are a few queries (the number in parenthesis indicates expected #hits):
> These ones work *as expected*:
> * (1)  in-order, slop=0, "b", "x", "b"
> * (1)  in-order, slop=0, "b", "b"
> * (2)  in-order, slop=1, "b", "b"
> These ones match *too many* hits:
> * (1)  any-order, slop=0, "b", "x", "b"
> * (1)  any-order, slop=1, "b", "x", "b"
> * (1)  any-order, slop=2, "b", "x", "b"
> * (1)  any-order, slop=3, "b", "x", "b"
> These ones match *too many* hits as well:
> * (1)  any-order, slop=0, "b", "b"
> * (2)  any-order, slop=1, "b", "b"
> Each of the above passes when using a phrase query (applying the slop, no in-order indication
in phrase query).
> This seems related to a known overlapping spans issue - [non-overlapping Span queries|]
- as indicated by Hoss, so we might decide to close this bug after all, but I would like to
at least have the junit that exposes the behavior in JIRA.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message