lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: SpanNearQuery's spans & payloads
Date Fri, 11 Sep 2009 21:04:01 GMT
Just FYI I recall a fair amount of discussion on SpanNear:

http://www.lucidimagination.com/search/s:email/l:dev?q=SpanNearQuery
http://www.lucidimagination.com/search/?q=NearSpansOrdered#/s:email/l:dev
See also http://issues.apache.org/jira/browse/LUCENE-1001

I remember being very confused by NearSpansOrdered and UnOrdered and  
also thinking there are some oddities (scoring not withstanding).

On Sep 11, 2009, at 2:32 PM, Michael McCandless wrote:

> Under LUCENE-1458, I'm hitting a curious test failure in
> TestPositionsIncrement.testPayloadsPos0.  The failure happens because
> the codec I'm testing (pulsing codec) allows you to retrieve the same
> payload more than once if the term was pulsed (inlined into terms
> dict), whereas w/ trunk you can only retrieve the payload once.
>
> But in debugging the failure, I'm struggling with what the correct
> behavior of SpanNearQuery really should be.
>
> The test creates a single doc with one analyzed field, with these
> single letter position:tokens:
>
>   0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k
>
> every token has a payload.
>
> Then it makes:
>
>  SpanNearQuery
>    SpanTermQuery term=a
>    SpanTermQuery term=k
>
> Term "a" occurs four times (positions 0, 1, 3, 6) and "k" occurs 2
> times (positions 7, 8).
>
> My first question is: what spans is SpanNearQuery supposed to
> enumerate?  Right now trunk does these four:
>
>   span 0 to 8
>   span 1 to 8
>   span 3 to 8
>   span 6 to 8
>
> which represents position 7 of "k" mated with all positions of "a".
> (remember end is 1+, so "k"'s position 7 turned into 8).  How come the
> position 8 occurrence of "k" was not included in any spans?
>
> My second question is: when you call getPayload() on each span, what
> should you get?  Right now trunk does this:
>
>    span 0 to 8
>      payload: pos: 0
>      payload: pos: 7
>    span 1 to 8
>      payload: pos: 0
>    span 3 to 8
>      payload: pos: 3
>    span 6 to 8
>      payload: pos: 6
>
> The first span properly includes the payload for "a" (pos: 0) and for
> "k" (pos: 7), but the the subsequent three do not include the payload
> for "k".  Shouldn't you get all payloads associated w/ the span?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message