lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: [jira] Updated: (LUCENE-1448) add getFinalOffset() to TokenStream
Date Tue, 11 Nov 2008 20:29:06 GMT
This stuff is confusing!  I think your numbers are not right.  Let's  
try reformatting with CHAR=POS.

Here's your example without the +1:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
a=0 b=1 c=2 d=3  =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13  =14 m=15  
a=16 n=17

   abcd 0-4
crunch 8-14
    man 15-18

This is not how Lucene works today.  Lucene adds the +1 ("virtual
space character"):

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
a=0 b=1 c=2 d=3  =4 t=5 h=6 e=7  =8 c=9 r=10 u=11 n=12 c=13 h=14  =15  
m=16 a=17 n=18

   abcd 0-4
crunch 9-15
    man 16-19

I think?

Mark Miller wrote:

>
>> Is it? Lets straighten this out. Here is what I see from my test for:
>>
>> field = "abcd the"
>> field = "crunch man"
>>
>> abcd thecrunch man
>> a0b1c2d3 4t5h6e7c8r9u10n11c12h13  14m15a16n17
>>
>> Without the +1 I got:
>> abcd: 0-4
>> crunch: 7-13
>> man: 14-17
>>
>> Something like that anyway.
>>
>> With +1 I got:
>> abcd:0-4
>> crunch:8-14
>> man:14-18
> *correction*
> should have been - man: 15-18
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message