lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul.elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-569) NearSpans skipTo bug
Date Sat, 13 May 2006 20:53:08 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-569?page=comments#action_12383404 ] 

paul.elschot commented on LUCENE-569:
-------------------------------------

Hoss,

I'm afraid you've uncovered a bug in NearSpans.java for the ordered case.
The test case testNearSpansSkipToLikeNext() uses this test data:
doc 0: w1 w2 w3 .. ..
doc 1: w1 w3 w2 w3 ..
and an ordered SpanNearQuery with slop 1 for "w1 w2 w3" should match doc 0 and doc 1
The test first does a skipTo(0) on the NearSpans which succeeds to match doc 0.
Then it tries skipTo(1) on the NearSpans, which should succeed, but fails, because
NearSpans first does skipTo(1) on the Spans for the query terms,
which puts these term spans at
doc 1: w1 w3 w2
(as expected) but this does not match because it's not ordered.
The NearSpans then tries a next() on itself, which starts by doing next() on the term spans
for w1 in NearSpans.java near line 146:
      more = min().next();                        // trigger further scanning
However, in the ordered case, it should have advanced the first non ordered term,
here w3, and so it misses the match:
doc 1: w1 .. w2 w3 ..

I would recommend to use the alternative NearSpans from LUCENE 413 mentioned above
to fix this. The NearSpansOrdered there differs from the current version in that it does not
match overlapping subspans, but it passes all current test cases including TestNearSpans here.
Overlaps between Spans can occur when SpanNearQueries are nested and/or when multiple
terms are indexed on the same position.
In case this ordered non overlapping matching becomes an issue, it can always be fixed later.
The NearSpansUnordered there is just like the current NearSpans, only simplified, and this
matches overlapping subspans.


> NearSpans skipTo bug
> --------------------
>
>          Key: LUCENE-569
>          URL: http://issues.apache.org/jira/browse/LUCENE-569
>      Project: Lucene - Java
>         Type: Bug

>   Components: Search
>     Reporter: Hoss Man
>  Attachments: TestNearSpans.java
>
> NearSpans appears to have a bug in skipTo that causes it to skip over some matching documents
completely.  I discovered this bug while investigating problems with SpanWeight.explain, but
as far as I can tell the Bug is not specific to Explanations ... it seems like it could potentially
result in incorrect matching in some situations where a SpanNearQuery is nested in another
query such thatskipTo will be used ... I tried to create a high level test case to exploit
the bug when searching, but i could not.  TestCase exploiting the class using NearSpan and
SpanScorer will follow...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message