mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Meagher <john.meag...@gmail.com>
Subject Arff files to Naive Bayes
Date Wed, 07 Aug 2013 21:00:37 GMT
I'm just starting work with Mahout and I'm struggling getting an
example of a non-text based Naive Bayes classifier up and running.
The input will be feature vectors generated outside of Mahout.  As a
test I'm using arff files (anything else CSV-ish will work).  I've
been able to convert things into vectors in a few different ways, but
can't figure out what is needed to get the trainnb command to work.

Does the label index need to be generated through some manual process
or something other than the arff.vector or trainnb command?

Is there a specific format needed for the input arff files?  Specific
columns in a specific order?


Here's what I've tried so far in both 0.7 from CDH4 and 0.8 direct from Apache:

$ wget http://repository.seasr.org/Datasets/UCI/arff/iris.arff
$ mahout arff.vector --input iris.arff --output iris.model --dictOut iris.labels

This works and seems to be right so far

This is the command I think I need to train the Naive Bayes model.  It
fails when creating the label index with the exception below.

$ mahout trainnb -i iris.model/ -o iris.training -el -li iris.training.labels
...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:123)
at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:180)
at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
...


Thanks for the help,
John

Mime
View raw message