lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject SpanNearQuery's spans & payloads
Date Fri, 11 Sep 2009 18:32:19 GMT
Under LUCENE-1458, I'm hitting a curious test failure in
TestPositionsIncrement.testPayloadsPos0.  The failure happens because
the codec I'm testing (pulsing codec) allows you to retrieve the same
payload more than once if the term was pulsed (inlined into terms
dict), whereas w/ trunk you can only retrieve the payload once.

But in debugging the failure, I'm struggling with what the correct
behavior of SpanNearQuery really should be.

The test creates a single doc with one analyzed field, with these
single letter position:tokens:

   0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k

every token has a payload.

Then it makes:

    SpanTermQuery term=a
    SpanTermQuery term=k

Term "a" occurs four times (positions 0, 1, 3, 6) and "k" occurs 2
times (positions 7, 8).

My first question is: what spans is SpanNearQuery supposed to
enumerate?  Right now trunk does these four:

   span 0 to 8
   span 1 to 8
   span 3 to 8
   span 6 to 8

which represents position 7 of "k" mated with all positions of "a".
(remember end is 1+, so "k"'s position 7 turned into 8).  How come the
position 8 occurrence of "k" was not included in any spans?

My second question is: when you call getPayload() on each span, what
should you get?  Right now trunk does this:

    span 0 to 8
      payload: pos: 0
      payload: pos: 7
    span 1 to 8
      payload: pos: 0
    span 3 to 8
      payload: pos: 3
    span 6 to 8
      payload: pos: 6

The first span properly includes the payload for "a" (pos: 0) and for
"k" (pos: 7), but the the subsequent three do not include the payload
for "k".  Shouldn't you get all payloads associated w/ the span?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message