lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: SpanNearQuery's spans & payloads
Date Fri, 11 Sep 2009 20:02:28 GMT
Thanks Mark! -- comments below:

On Fri, Sep 11, 2009 at 3:34 PM, Mark Miller <markrmiller@gmail.com> wrote:

> I'd have to dig in to be of much help. Hard to remember this stuff.
>
> 0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k
>
>  span 0 to 8
>  span 1 to 8
>  span 3 to 8
>  span 6 to 8
>
> I think those are the right 4. You start on the left and work
> right. Spans always start after the last one started.

OK, so SpanNearQuery always takes its left-most clause, releases a
span, and then advances it?  What if there is a tie for two left-most
clauses?

Eg if I had included "b" as a clause, here, it'd tie with "a" at
position 1 -- hmm, I just tested this: you get "span 1 to 8" twice:

    span 0 to 8
       payload: pos: 7
       payload: pos: 1
       payload: pos: 0
    span 1 to 8
       payload: pos: 0
    span 1 to 8
       payload: pos: 3
    span 3 to 8
       payload: pos: 6
    span 6 to 8
       payload: pos: 6

Also, the payloads sort of shifted down (eg "pos: 3" now shows up in
the "span 1 to 8" but before showed up in "span 3 to 8"), and "pos: 1"
(for b) was added under "span 0 to 8".

(NOTE: confusingly, the "payload: pos: N" is off by one, in this test,
ie the "real" position is N+1).

> So first you would find: 0 to 8. After 0, 1 to 8.
> After 1, 3 to 8, and after 3, 6 to 8. That makes sense.
> You never see 9 because the 8 comes first and you can
> end as many times on a pos as you want - but you dont
> ever start a span at the same pos. So I think this is right.

I think (if I were using SpanNearQuery) I'd want it to somehow include
9, but I'm not quite sure how.  This test sets slop to 30, so maybe
I'd want to see 0-9, 1-9, 3-9, 6-9?  Ie the "maximal" spans possible.
EG my app will never see "k"'s payload from its occurrence at position
8.

> The second question I am less sure about without looking at code.
> I think its because each payload can only be loaded once. So the first
> time you hit 0 to 8, you get both payloads - but every other span that
> hits 8, that payload was already loaded ? So you get all of the payloads
> you should, your just not duplicates in each span. I'd have to think
> harder about it - but overall it appears right ... ?

Yeah that is the reason why you only see each payload once, but I'm
not sure that's "right".  I guess an app can always store away each
payload and pull it later, but eg it the app wants to score each span
using the payloads from all occurrences of clauses within it, you
can't trust getPayloads for that.

> All the Spans are subspans of a larger Span right?

Not sure what you mean here?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message