lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: word + ngram tokenization
Date Tue, 05 Apr 2011 23:37:01 GMT
Hi Shambhu,

ShingleFilter will construct word n-grams:

http://lucene.apache.org/java/3_1_0/api/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html

Steve

> -----Original Message-----
> From: sham singh [mailto:shamsingh4u@gmail.com]
> Sent: Tuesday, April 05, 2011 5:53 PM
> To: java-user@lucene.apache.org
> Subject: word + ngram tokenization
> 
> Hi All,
> 
> I have to do tokenization which is combination of NGram and Standard
> tokenization
> for ex if the content is  :"the quick brown fox jumped over the lazy dog"
> requirement is to tokenize into:
> quick brown fox
> brown fox jumped
> fox jumped over etc
> ..
> ..
> 
> Please help me to find out best analyzer for my requirement
> 
> Thanks in Advance
> 
> --
> Many Thanks,
> Shambhu
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message