mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HorstItUpright <horstItUpri...@gmail.com>
Subject Which input formats to use for classifying WEKA's ARFF format?
Date Tue, 22 Nov 2011 16:03:40 GMT
Hello,

I am currently working on classification algorithms with Mahout. The
first part is to evaluate several different approachs already
available.

As far as I know, Mahout provides two Bayes algorithms and a Random
Forest (which is - whyever - called Dicision Forest [which is not
wrong, I know, but confusing and inconsistent to the Docs I think]).

It appears to me (and I've also taken a look into the code) that none
of these approaches can handle the MVC format (which is the result,
when parsing the WEKA-ARFF files with the arff-vector converter). The
DF is even more special and requires the UCI format.

My question now is: am I overseeing something? Is there a way to
convert the MVC files on the fly into the proper formats for the
algorithms?
I've expected that algorithms that are part of Mahout since quite a
lot of reversions, take more or less any Mahout input data or at least
output some useful error messages.
The Bayes algorithms e.g. are running with the input data, but print a
lot of strange output to the console during processing and do not give
any usable results.

Am I right, that I need to convert my ARFF or MVC files to the
UCI-format or the "Bayes-format" (the one used in the 20news example)?

PS: I am using the latest checkout as well as the "official" 0.5 release.

Best regards,
Martin

Mime
View raw message