mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Rusira <tharindurus...@gmail.com>
Subject Naive Bayes classification
Date Tue, 18 Mar 2014 13:22:56 GMT
Hi everyone,
I'm developing an application where I need to train a Naive Bayes
classification model and use this model to classify new entities(In this
case text files based on their content)

I observed that seqdirectory command always adds the file/directory name as
the "key" field for each document which will be used as the label in
classification jobs.
This makes sense when I need to train a model and create the labelindex
since I have organized my training data according to their labels in
separate directories.

Now I'm trying to use this model and infer the best label for an unknown
document.
My requirement is to ask Mahout to read my new file and output the
predicted category by looking at the labelindex and the tfidf vector of the
new content.
I tried creating vectors from the new content (seqdirectory and
seq2sparse), and then using this vector to run testnb command. But
unfortunately seqdirectory commands adds file names as labels which does
not make sense in classification.

The following error message will further demonstrate this behavior.
imput0.txt is the file name of my new document.

[main] ERROR com.me.classifier.mahout.MahoutClassifier - Error while
classifying documents
java.lang.IllegalArgumentException: Label not found: input0.txt
    at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:125)
    at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:182)
    at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:205)
    at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:209)
    at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:173)
    at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:70)
    at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.analyzeResults(TestNaiveBayesDriver.java:160)
    at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:125)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:66)


So how can I achieve what I'm trying to do here?

Thanks,


-- 
M.P. Tharindu Rusira Kumara

Department of Computer Science and Engineering,
University of Moratuwa,
Sri Lanka.
+94757033733
www.tharindu-rusira.blogspot.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message