lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Carpenter <c...@alias-i.com>
Subject Re: is there any n-gram analyzer available??
Date Tue, 21 Nov 2006 23:01:08 GMT
heritrix.lucene wrote:
> Thanks for your reply.
> 
> This analyzer creates combination of words. I am looking for analyzer where
> you can break up the words into their n-grams. For example:
> 2-grams of
> google - > go, oo, og, gl, le
> like that.

This is also easy.  You can check out our
sample in Gospodentic and Hatcher's Lucene
in Action book if you want to stream them out.
If you're willing to collect them and then push
them out, it's even easier.  (Oh, how I wish
we had the yielding iterator construct of Python
in Java.)

Our version allows you to specify minimum n-gram
length and maximum n-gram length.  You
might want to put them in different fields
if you want weighting between them to be
easy.

- Bob Carpenter
   Alias-i


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message