lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaz77 <>
Subject Re: Confused with NGRAM results
Date Thu, 28 Aug 2008 22:50:46 GMT

Thanks for the pointer.

I've gone into this in some depth, using the AnalyzerUtils class from the
lucene in action book.

It seems that the NGramTokenFilter is only processing part of the string
that goes in. It stops tokenising the words part way through. That's why the
documents weren't found in results.

I've had a look at the source code, and I think it's because the next()
function returns null when it hits a token smaller than the min ngram size.
For example, if I set the minimum to 3, then a 2-character token will cause
it to return null.

I'm not sure if this is by design or a bug. either way, at least I know
what's causing it now.


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message