lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SpanNearQuery's spans & payloads
Date Sat, 12 Sep 2009 13:21:55 GMT
Sorry for the spam - type of '8' instead of 'a' - hard enough to follow
without that - read this one below instead:

Mark Miller wrote:
> Mark Miller wrote:
>   
>>> Yeah I think you do, except each payload is only returned once.  So
>>> it's only the first span that hits a payload that will return it.
>>>
>>> So it sounds like SNQ just isn't guaranteed to be exhaustive in how it
>>> enumerates the spans, eg I'll never see that 2nd occurrence of "k",
>>> nor its associated payload.
>>>   
>>>     
>>>       
>> Not only not guaranteed, but its just not going to happen - its not
>> how spans match. If I say find n within 300 of m with the following:
>>
>> n m m m m m m m m m m m m  m m m m m m m m m m m m m m m m m m m m m m
>> m  m m m m m m m m m m m
>>
>> Only the first m will match. It will start at the left, find the n, then
>> say great, an m within 300, this doc matches, we are done. There is
>> not another n to start on or finish on to the right. It doesn't then
>> touch the next 300 m's - just they way Doug implemented them from what I
>> can tell. Its only exhaustive from the
>> left - find m within 300 of n, order matters (m first)
>>
>> m m m m m m m m m m m m m m m m m m n
>>
>> This will be a bunch of spans - start at the left - the first m to n
>> matches, then the second m - n matches, then the third m to n matches,
>> and so on as we move right.
>>   
>>     
> You can figure out what will match using the Span rules I mentioned by
> the way (at least
> I believe so).
>
> Those rules are simple (this is my current working knowledge and I don't
> guarantee it - but I havn't seen it off yet) -
>
> 1. Only one span can start from a term.
> 2. Start matching from the left and work right.
>
> so applying to your example:
>
>   SpanNearQuery
>     SpanTermQuery term=a
>     SpanTermQuery term=k
>
>
> 0:a 1:a 1:b 2:c 2:d 3:e 3:a 4:f 4:g 5:h 5:i 6:j 6:a 7:b 7:k 8:k
>   
>>  span 0 to 8
>>  span 1 to 8
>>  span 3 to 8
>>  span 6 to 8
>>     
>
> So first  we see 0 which is an a - we draw our span because the k at 7
> is within 30: 0-8.
> We move move right now, because we can't start at that term again.
> Another a - and again the
> k at 7 is within 30 - mark our span 1-8. Now we have to move right one
> at least, but we don't
> find the next a till 3 - again there is a k within 30 at 7 - mark our
> span: 3-8. Now move right a
> term at least - we find another a at 6 - again there is a k within 30 at
> 7 - mark our span: 6-8.
> Now we are done. We never needed or used the k at 8 (ends at 9) in the
> Spans algorithm.
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message