lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaz77 <>
Subject Confused with NGRAM results
Date Thu, 28 Aug 2008 14:52:59 GMT


I'd appreciate if someone could explain the results I'm getting.

I've written a simple custom analyzer that applies the NGramTokenFilter to
the token stream during indexing. It's never applied during searching. The
purpose of this is to match sub-words.

Without the ngram filter, if I searched on the word 'postcode' it returns 2
documents. If I searched on 'code' it returns 6 documents (with no overlap
on the postcode results).

If I apply the ngram filter with min 1 and max 10, searching 'postcode'
returns the same 2 docs, while searching 'code' returns 9 docs. This sort-of
feels right. 

The problem comes when I set the min ngram size to 3, and the max to 5.
Searching 'postcode' returns no results (as expected), but searching 'code'
only returns 2 docs (the 2 normally returned by a 'postcode' search).

This last result for 'code' just doesn't seem correct - it should be
returning at least the 6 docs from the original search.

I'd really appreciate some advice on what is going on with the ngram filter.

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message