lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From phongtvcc <>
Subject Index writer for Ngram
Date Mon, 26 Mar 2012 14:31:08 GMT
I want create indexWriter for character Ngram. ex: Lucene is a great
language. Then i want to use Ngram with n=3 to become: Luc uce cen ene is a
gre rea eat....

my code:

IndexWriter writer = new IndexWriter(INDEX_DIR, new
PositionalPorterStopAnalyzer(), true,
Reader reader = new FileReader(f);
Document doc = new Document();
NGramTokenizer token=new NGramTokenizer(token,3,3);
doc.add(new Field("contents", new FileReader(f)));
doc.add(new Field("vector",token,Field.TermVector.YES));

With above code I only create IndexWriter for token with extract 3 character
but it is not gram.
Who can help me for this issues? because token on above NgramTokenizer only
extract 3 character without 3 character of Ngram?
Thanks very much in advance for your help? 

View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message