lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hiroaki Kawai <ka...@apache.org>
Subject Re: NGrams and positions
Date Fri, 16 May 2008 15:49:31 GMT
I think it is not the matter what ngram in genral.

NGramTokenFilter is a TokenFilter, and this produce a
TRICKY token stream because it is processed more than 
one tokenizer.

This discussion is about the mechanism of tokenFilter
itself.

The NGramTokenFilter creates a so tricky token 
stream in the current implementation that one might be 
consider that is a new version of n-gram.

The token stream genrerated by NGramTokenFilter is 
processed not only by n-gram tokenizer but also a
mixture of the other tokenizers, so the token stream
might not look like a normal n-gram.

I think Grant is talking about StandardTokenizer + NGramTokenFilter, 
isn't?


Grant Ingersoll <gsingers@apache.org> wrote:
> On May 16, 2008, at 11:13 AM, Hiroaki Kawai wrote:
> 
> > I think LUCENE-1224 is more complex than LUCENE-1225.
> >
> > First, I want to solve LUCENE-1225. It might be more
> > simple to understand.
> >
> > For LUCENE-1224, I came to the same issue. My current
> > understanding is this comes from mismatch of TokenFilter and position.
> > I apologyze for that the patch is confusing. I'm aware that the patch
> > still has another issue.
> 
> The patch itself isn't confusing, IMO (the only issue with the patch  
> is the unit test, but that is for the JIRA discussion).  I think it  
> does what it says it does.  This discussion is more philosophical as  
> to what kinds of things people want to do with ngrams in general.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message