lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SpanNearQuery's spans & payloads
Date Sat, 12 Sep 2009 13:19:28 GMT
Mark Miller wrote:
>
>> Yeah I think you do, except each payload is only returned once.  So
>> it's only the first span that hits a payload that will return it.
>>
>> So it sounds like SNQ just isn't guaranteed to be exhaustive in how it
>> enumerates the spans, eg I'll never see that 2nd occurrence of "k",
>> nor its associated payload.
>>   
>>     
> Not only not guaranteed, but its just not going to happen - its not
> how spans match. If I say find n within 300 of m with the following:
>
> n m m m m m m m m m m m m  m m m m m m m m m m m m m m m m m m m m m m
> m  m m m m m m m m m m m
>
> Only the first m will match. It will start at the left, find the n, then
> say great, an m within 300, this doc matches, we are done. There is
> not another n to start on or finish on to the right. It doesn't then
> touch the next 300 m's - just they way Doug implemented them from what I
> can tell. Its only exhaustive from the
> left - find m within 300 of n, order matters (m first)
>
> m m m m m m m m m m m m m m m m m m n
>
> This will be a bunch of spans - start at the left - the first m to n
> matches, then the second m - n matches, then the third m to n matches,
> and so on as we move right.
>   
You can figure out what will match using the Span rules I mentioned by
the way (at least
I believe so).

Those rules are simple (this is my current working knowledge and I don't
guarantee it - but I havn't seen it off yet) -

1. Only one span can start from a term.
2. Start matching from the left and work right.

so applying to your example:

  SpanNearQuery
    SpanTermQuery term=a
    SpanTermQuery term=k


0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k
>
>  span 0 to 8
>  span 1 to 8
>  span 3 to 8
>  span 6 to 8

So first  we see 0 which is an 8 - we draw our span because the k at 7
is within 30: 0-8.
We move move right now, because we can't start at that term again.
Another a - and again the
k at 7 is within 30 - mark our span 1-8. Now we have to move right one
at least, but we don't
find the next a till 3 - again there is a k within 30 at 7 - mark our
span: 3-8. Now move right a
term at least - we find another a at 6 - again there is a k within 30 at
7 - mark our span: 6-8.
Now we are done. We never needed or used the k at 8 (ends at 9) in the
Spans algorithm.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message