lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Spans questions
Date Sun, 16 Sep 2007 08:34:03 GMT

Meanwhile it occurred to me that your situation is about containment of spans,
and the one currently implemented is about overlaps and order. Containment
is actually a special case of overlap, but with containment there is less
need to talk about order. Perhaps span containment could even be
treated as a case closely related to SpanNotQuery.

Regards,
Paul Elschot



On Sunday 16 September 2007 04:43, Grant Ingersoll wrote:
> 
> On Aug 30, 2007, at 2:42 PM, Grant Ingersoll wrote:
> 
> >
> > On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote:
> >
> >> Grant,
> >>
> >> On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote:
> >>> Couple of Spans questions for people:
> >>>
> >>> 1.  Would the docs be clearer for Spans.end() if it said that the
> >>> span is not inclusive of the end position?  From what I can tell, it
> >>> is not inclusive, correct?
> >>
> >> Yes. The easiest place to see that is in TermSpans.end(),
> >> which is the term position plus 1, see TermSpans.java line 89.
> >>
> >
> > I will update the docs to make it explicit.
> >
> >>
> >>> 2. I have added the following test to TestSpans.java
> >>> public void testSpanNearUnOrdered() throws Exception {
> >>>
> >>>      SpanNearQuery snq;
> >>>      SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
> >>> {makeSpanTermQuery("u1"),
> >>>                                  makeSpanTermQuery("u2")}, 0,  
> >>> false);
> >>>      snq = new SpanNearQuery(
> >>>                                new SpanQuery[] {
> >>>                                  u1u2,
> >>>                                  makeSpanTermQuery("u2")
> >>>                                },
> >>>                                1,
> >>>                                false);
> >>>      spans = snq.getSpans(searcher.getIndexReader());
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 4, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 3, spans.end());
> >>>
> >>> //Why does this match?
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 4, spans.doc());
> >>>      assertEquals("start", 1, spans.start());
> >>>      assertEquals("end", 3, spans.end());
> >>>
> >>>      ...
> >>>    }
> >>>
> >>> My question is why does the second span match?  Doc 4 looks like:
> >>> "u2 u2 u1"  (see the docFields array in TestSpans.java)  It seems
> >>> incorrect because it is completely inside of the other Span, but
> >>> maybe I am just not understanding the slop factor or something about
> >>> unordered spans.  I would think there would only be one match for
> >>> this document since the u1u2 has a slop of 0 and the snq has a slop
> >>> of 1 (which shouldn't matter, since there are no other  
> >>> permutations).
> >>
> >> I split the original NearSpans into an ordered and an unordered
> >> version because there was a bug LUCENE-569 for the ordered
> >> case that was difficult to fix while keeping these two cases
> >> in the same class.
> >>
> >> I documented the ordered case in the javadoc of the
> >> NearSpansOrdered class. I also specialized the original
> >> NearSpans class to implement only the unordered case,
> >> and did not add javadoc comments there.
> >>
> >> In the current version of NearSpansOrdered the subspans should
> >> not overlap to form a match. I did that to prevent the
> >> ordered spans query "t1 t1" to match all single occurrences of t1.
> >> Btw. similar considerations apply for terms indexed at the same
> >> position. However, iirc there is no test case for a span near query
> >> with the same terms (subspans).
> >>
> >
> >> At the time of LUCENE-569 I considered writing separate versions
> >> of ordered/unordered and overlapping/non overlapping, but that
> >> would have resulted in four different cases, and the split into  
> >> ordered/
> >> unordered was enough to fix the bug, so I left it at that.
> >> The split into ordered and unordered was a split
> >> into (ordered + non overlapping) and (unordered + overlapping),
> >> and this is what you see in your test cases for unordered spans.
> >>
> >> To totally clear the semantics of NearSpans, it is probably a good
> >> idea to make all four cases for the subspans separately available.
> >>
> >>
> >
> > Thanks for the info, Paul.  This makes sense.  I am not sure how I  
> > feel about spans within spans.  I think in my test case it isn't  
> > that they are overlapping, the one is a subset of the other, which  
> > doesn't seem correct, but maybe I am wrong.  I think you are right,  
> > that we should make the 4 cases explicit.
> 
> In thinking about this some more, I think it is actually doing a  
> reasonable thing, even if it is still a subset of the other, thus I  
> am going to leave it as is (and update my test).  The results that  
> are returned are "narrower" and I can thus see a case being made for  
> returning them.
> 
> Still, given a doc:
> u2 u2 u1
> 
> and
> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[] 
> {makeSpanTermQuery("u1"),
>                                  makeSpanTermQuery("u2")}, 0, false);
>      snq = new SpanNearQuery(
>                                new SpanQuery[] {
>                                  u1u2,
>                                  makeSpanTermQuery("u2")
>                                },
>                                1,
>                                false);
> 
> 
> I am not totally sure it makes sense to return 0-3 as a span AND 1-3  
> as Span because the second "u2" is being used to satisfy the u1u2  
> clause AND the solo "u2" clause in the snq query above.  However,  
> since this behavior has been around for a while and no one has really  
> complained and I can understand wanting to satisfy the clauses this  
> way, I can be convinced to leave it alone.
> 
> Anyone have opinions otherwise?
> 
> 
> 
> >
> >
> >> Regards,
> >> Paul Elschot
> >>
> >>
> >> P.S. I also remember hesitating between the class names
> >> NearSpansUnordered and NearSpansUnOrdered. In case
> >> you want to change the class name in the trunk to
> >> NearSpansUnOrdered, please do so.
> >>
> >
> > I won't change them.  I am never sure how to name those edge cases,  
> > either.
> >
> > Cheers,
> > Grant
> >
> >
> >>> In my mind, the correct test should be something like:
> >>> public void testSpanNearUnOrdered() throws Exception {
> >>>
> >>>      SpanNearQuery snq;
> >>>      snq = new SpanNearQuery(
> >>>                                new SpanQuery[] {
> >>>                                  makeSpanTermQuery("u1"),
> >>>                                  makeSpanTermQuery("u2") },
> >>>                                0,
> >>>                                false);
> >>>      Spans spans = snq.getSpans(searcher.getIndexReader());
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 4, spans.doc());
> >>>      assertEquals("start", 1, spans.start());
> >>>      assertEquals("end", 3, spans.end());
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 5, spans.doc());
> >>>      assertEquals("start", 2, spans.start());
> >>>      assertEquals("end", 4, spans.end());
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 8, spans.doc());
> >>>      assertEquals("start", 2, spans.start());
> >>>      assertEquals("end", 4, spans.end());
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 9, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 2, spans.end());
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 10, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 2, spans.end());
> >>>      assertTrue("Has next and it shouldn't: " + spans.doc(),
> >>> spans.next() == false);
> >>>
> >>>      SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
> >>> {makeSpanTermQuery("u1"),
> >>>                                  makeSpanTermQuery("u2")}, 0,  
> >>> false);
> >>>      snq = new SpanNearQuery(
> >>>                                new SpanQuery[] {
> >>>                                  u1u2,
> >>>                                  makeSpanTermQuery("u2")
> >>>                                },
> >>>                                1,
> >>>                                false);
> >>>      spans = snq.getSpans(searcher.getIndexReader());
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 4, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 3, spans.end());
> >>>
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 5, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 4, spans.end());
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 8, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 5, spans.end());
> >>>
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 9, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 5, spans.end());
> >>>
> >>>      assertTrue("Does not have next and it should", spans.next());
> >>>      assertEquals("doc", 10, spans.doc());
> >>>      assertEquals("start", 0, spans.start());
> >>>      assertEquals("end", 5, spans.end());
> >>>      assertTrue("Has next and it shouldn't", spans.next() == false);
> >>>    }
> >>>
> >>>
> >>>
> >>> Thanks,
> >>> Grant
> >>>
> >>>
> >>> -------------------------------------------------------------------- 
> >>> -
> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> > ------------------------------------------------------
> > Grant Ingersoll
> > http://www.grantingersoll.com/
> > http://lucene.grantingersoll.com
> > http://www.paperoftheweek.com/
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> 
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
> http://lucene.grantingersoll.com
> http://www.paperoftheweek.com/
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message