lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul.elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-569) NearSpans skipTo bug
Date Tue, 16 May 2006 07:39:06 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-569?page=comments#action_12411904 ] 

paul.elschot commented on LUCENE-569:
-------------------------------------

> I tried to make sense of the existing NearSpans implimentation over the weekend ... i
did not succeed.
> I still haven't had a cahnce to look at the new one in LUCENE-413 but i wnat to clarify
something you said.. 

For the unordered case the priority queue implementation over the subspans in the current
NearSpans is fine.
For the ordered case I could not figure out how to deal with the priority queue and the restriction
on
ordering at the same time. This is precisely what the bug above shows.
 
> >>> The NearSpansOrdered there differs from the current version in that it does
not 
> >>> match overlapping subspans, but it passes all current test cases including
TestNearSpans here. 
>
> ...should I understand you to mean then that the current implimentaion of NearSpans does
work
> correctly with overlapping sub-spans ... there just isnt' a test for it? 

For ordered queries, it might work with overlapping sub-spans on some cases.
However, I'd expect any test to run into the bug above for some other ordered cases.
 
> that seems like important enough behavior that we wouldn't want to break it to fix this
bug. 

Given the bug, I hope nothing depends on it.

> Even if matching on overlapping subspans wasn't an intentional feature of NearSpans --
the fact that it
> currently works and the documentation is silent on the issue suggests to me that it should
remain supported. 

That can probably be done by modifying the NearSpansOrdered of LUCENE-413 at lines 133-138
and at
line 167 where the end of the previous (possibly matching) subspans is compared to the start
of the next one.
This could compare the start with the start instead.
I don't know what precisely is the intended behaviour, so I can't say whether these changed
comparisons
should allow equality or not. Perhaps the ends should be compared when the starts are equal,
just like it is done in the priority queue for the unordered case.



> NearSpans skipTo bug
> --------------------
>
>          Key: LUCENE-569
>          URL: http://issues.apache.org/jira/browse/LUCENE-569
>      Project: Lucene - Java
>         Type: Bug

>   Components: Search
>     Reporter: Hoss Man
>  Attachments: TestNearSpans.java
>
> NearSpans appears to have a bug in skipTo that causes it to skip over some matching documents
completely.  I discovered this bug while investigating problems with SpanWeight.explain, but
as far as I can tell the Bug is not specific to Explanations ... it seems like it could potentially
result in incorrect matching in some situations where a SpanNearQuery is nested in another
query such thatskipTo will be used ... I tried to create a high level test case to exploit
the bug when searching, but i could not.  TestCase exploiting the class using NearSpan and
SpanScorer will follow...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message