lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Gokhale <>
Subject Text Categorization with Lucene (N-Gram technique)
Date Sun, 24 Jul 2011 16:38:10 GMT
Hi All,

I need to work on the application where I have to categorize text (group of
sentences) into multiple pre-defined categories.

As I understand from the searches on the internet, theoretically it is
possible with Ngram based index and matching the incoming text n-gram with
the known fingerprint of the category.

I wanted to know if Lucene already has any contribution done in this regards
that I can find in the contrib directory or is there any example that I can
look at else where.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message