lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <grant.ingers...@gmail.com>
Subject Re: Spans questions
Date Thu, 30 Aug 2007 18:42:36 GMT

On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote:

> Grant,
>
> On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote:
>> Couple of Spans questions for people:
>>
>> 1.  Would the docs be clearer for Spans.end() if it said that the
>> span is not inclusive of the end position?  From what I can tell, it
>> is not inclusive, correct?
>
> Yes. The easiest place to see that is in TermSpans.end(),
> which is the term position plus 1, see TermSpans.java line 89.
>

I will update the docs to make it explicit.

>
>> 2. I have added the following test to TestSpans.java
>> public void testSpanNearUnOrdered() throws Exception {
>>
>>      SpanNearQuery snq;
>>      SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
>> {makeSpanTermQuery("u1"),
>>                                  makeSpanTermQuery("u2")}, 0, false);
>>      snq = new SpanNearQuery(
>>                                new SpanQuery[] {
>>                                  u1u2,
>>                                  makeSpanTermQuery("u2")
>>                                },
>>                                1,
>>                                false);
>>      spans = snq.getSpans(searcher.getIndexReader());
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 4, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 3, spans.end());
>>
>> //Why does this match?
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 4, spans.doc());
>>      assertEquals("start", 1, spans.start());
>>      assertEquals("end", 3, spans.end());
>>
>>      ...
>>    }
>>
>> My question is why does the second span match?  Doc 4 looks like:
>> "u2 u2 u1"  (see the docFields array in TestSpans.java)  It seems
>> incorrect because it is completely inside of the other Span, but
>> maybe I am just not understanding the slop factor or something about
>> unordered spans.  I would think there would only be one match for
>> this document since the u1u2 has a slop of 0 and the snq has a slop
>> of 1 (which shouldn't matter, since there are no other permutations).
>
> I split the original NearSpans into an ordered and an unordered
> version because there was a bug LUCENE-569 for the ordered
> case that was difficult to fix while keeping these two cases
> in the same class.
>
> I documented the ordered case in the javadoc of the
> NearSpansOrdered class. I also specialized the original
> NearSpans class to implement only the unordered case,
> and did not add javadoc comments there.
>
> In the current version of NearSpansOrdered the subspans should
> not overlap to form a match. I did that to prevent the
> ordered spans query "t1 t1" to match all single occurrences of t1.
> Btw. similar considerations apply for terms indexed at the same
> position. However, iirc there is no test case for a span near query
> with the same terms (subspans).
>

> At the time of LUCENE-569 I considered writing separate versions
> of ordered/unordered and overlapping/non overlapping, but that
> would have resulted in four different cases, and the split into  
> ordered/
> unordered was enough to fix the bug, so I left it at that.
> The split into ordered and unordered was a split
> into (ordered + non overlapping) and (unordered + overlapping),
> and this is what you see in your test cases for unordered spans.
>
> To totally clear the semantics of NearSpans, it is probably a good
> idea to make all four cases for the subspans separately available.
>
>

Thanks for the info, Paul.  This makes sense.  I am not sure how I  
feel about spans within spans.  I think in my test case it isn't that  
they are overlapping, the one is a subset of the other, which doesn't  
seem correct, but maybe I am wrong.  I think you are right, that we  
should make the 4 cases explicit.


> Regards,
> Paul Elschot
>
>
> P.S. I also remember hesitating between the class names
> NearSpansUnordered and NearSpansUnOrdered. In case
> you want to change the class name in the trunk to
> NearSpansUnOrdered, please do so.
>

I won't change them.  I am never sure how to name those edge cases,  
either.

Cheers,
Grant


>> In my mind, the correct test should be something like:
>> public void testSpanNearUnOrdered() throws Exception {
>>
>>      SpanNearQuery snq;
>>      snq = new SpanNearQuery(
>>                                new SpanQuery[] {
>>                                  makeSpanTermQuery("u1"),
>>                                  makeSpanTermQuery("u2") },
>>                                0,
>>                                false);
>>      Spans spans = snq.getSpans(searcher.getIndexReader());
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 4, spans.doc());
>>      assertEquals("start", 1, spans.start());
>>      assertEquals("end", 3, spans.end());
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 5, spans.doc());
>>      assertEquals("start", 2, spans.start());
>>      assertEquals("end", 4, spans.end());
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 8, spans.doc());
>>      assertEquals("start", 2, spans.start());
>>      assertEquals("end", 4, spans.end());
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 9, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 2, spans.end());
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 10, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 2, spans.end());
>>      assertTrue("Has next and it shouldn't: " + spans.doc(),
>> spans.next() == false);
>>
>>      SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
>> {makeSpanTermQuery("u1"),
>>                                  makeSpanTermQuery("u2")}, 0, false);
>>      snq = new SpanNearQuery(
>>                                new SpanQuery[] {
>>                                  u1u2,
>>                                  makeSpanTermQuery("u2")
>>                                },
>>                                1,
>>                                false);
>>      spans = snq.getSpans(searcher.getIndexReader());
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 4, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 3, spans.end());
>>
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 5, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 4, spans.end());
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 8, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 5, spans.end());
>>
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 9, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 5, spans.end());
>>
>>      assertTrue("Does not have next and it should", spans.next());
>>      assertEquals("doc", 10, spans.doc());
>>      assertEquals("start", 0, spans.start());
>>      assertEquals("end", 5, spans.end());
>>      assertTrue("Has next and it shouldn't", spans.next() == false);
>>    }
>>
>>
>>
>> Thanks,
>> Grant
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message