mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: mahout text mining
Date Fri, 17 Jan 2014 04:08:35 GMT
See http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
for classifying twitter messages.

Lucene has support for ngrams, stopwords, porter stemmer, snowball stemmer, language specific
analyzers etc...
Mahout uses Lucene for vectorization (part of Mahout's seq2sparse process).  
See http://mahout.apache.org/users/basics/creating-vectors-from-text.html







On Thursday, January 16, 2014 10:57 PM, qiaoresearcher <qiaoresearcher@gmail.com> wrote:
 
Mahout has an example of using naive bayes to classify 20 news group. but
how to just classify paragraphs  (e.g. twitter message, movie review) in
text files such as:

Text files has content like:
----------------------------------------------------------
text paragraph 1                     class a
text paragraph 2                     class b
text paragraph 3                     class a
text paragraph 4                     class b
.............                                      ...

does it support n grams, stem, stop words, etc?

thanks for any suggestions.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message