lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Text Categorization with Lucene (N-Gram technique)
Date Tue, 26 Jul 2011 15:36:01 GMT
Lucene has support for ngrams during indexing and querying.  The rest would have to be done
for you.  

<shamelessPlug>Taming Text chapter 7 has some basic implementations using Lucene to
do categorization.  http://www.manning.com/ingersoll</shamelessPlug>

-Grant

On Jul 24, 2011, at 12:38 PM, Saurabh Gokhale wrote:

> Hi All,
> 
> I need to work on the application where I have to categorize text (group of
> sentences) into multiple pre-defined categories.
> 
> As I understand from the searches on the internet, theoretically it is
> possible with Ngram based index and matching the incoming text n-gram with
> the known fingerprint of the category.
> 
> I wanted to know if Lucene already has any contribution done in this regards
> that I can find in the contrib directory or is there any example that I can
> look at else where.
> 
> Saurabh

--------------------------
Grant Ingersoll




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message