mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Häger <martin.ha...@byburt.com>
Subject Re: Classifying general Attribute-Relation data using Mahout
Date Wed, 10 Feb 2010 08:54:37 GMT
I went ahead and attached everything I sent to Robin to MAHOUT-286.

2010/2/9 Robin Anil <robin.anil@gmail.com>:
> I have the data. I will upload shortly
>
>
> On Wed, Feb 10, 2010 at 12:10 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> Martin,
>>
>> I saw only one attachment here.  The other may have been stripped by the
>> mailing list which prefers not to have attachments.
>>
>> I have filed an issue for this at
>> https://issues.apache.org/jira/browse/MAHOUT-286
>>
>> Can you attach your data files there so that we can work on getting a
>> better
>> resolution for you?
>>
>> On Mon, Feb 8, 2010 at 5:35 AM, Martin Häger <martin.hager@byburt.com
>> >wrote:
>>
>> > Hi Robin,
>> >
>> > The attached data.arff contains the test data, data.training.arff
>> > contains the training data. We're running the svn trunk (r906954) of
>> > Mahout. The attached script run.sh shows how we run it.
>> > Should it be possible to run Mahout's NaiveBayes classifier on this
>> > data in this way or is it limited to text documents only?
>> >
>> > Side note: We're expecting Weka to report 100% incorrect
>> > classification since all test data belongs to the class "unknown",
>> > whereas the training data is either "valid" or "invalid" (in fact, the
>> > test data is the entire "invalid" set, so Weka manages to classify
>> > everything correctly). We're not yet sure what class to put on the
>> > test data, as we of course can't know anything about it (hence the
>> > "unknown").
>> >
>> > 2010/2/8 Robin Anil <robin.anil@gmail.com>:
>> > > Can you send the train and test data to me. Are you using 0.2 release
>> or
>> > the
>> > > trunk?
>> > >
>> > > Seems model wasnt built as there was an error Exception in thread
>> "main"
>> > > org.apache.hadoop.mapred.InvalidInputException: Input path does not
>> > exist:
>> > > file:/tmp/hadoop/model/trainer-termDocCount
>> > > Input path does not exist: file:/tmp/hadoop/model/trainer-wordFreq
>> > > Input path does not exist: file:/tmp/hadoop/model/trainer-featureCount
>> > >
>> > > So there is no point running the classifier
>> > >
>> > > Weka also seems not to be doing good either.
>> > >
>> > >
>> > >
>> > > On Mon, Feb 8, 2010 at 6:24 PM, Martin Häger <martin.hager@byburt.com
>> > >wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> We're experimenting a bit with Weka and Mahout. Our input data is a
>> > >> relation in ARFF format (see attached data.training.arff), and we'd
>> > >> like to classify it using Mahout. However, it seems (to us, at first)
>> > >> that the Mahout classifier.bayes.interfaces.Algorithm interface is
>> > >> centered around documents of text, and not general attribute data.
>> > >> Thus, running the classifier causes our ARFF data to be interpreted
as
>> > >> a document of words, with not very useful results (see attached
>> > >> mahout.log).
>> > >>
>> > >> With Weka, we're able to get the results we want (see attached
>> > weka.log).
>> > >>
>> > >> Any suggestions for how to get this working?
>> > >>
>> > >> Thanks!
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>

Mime
View raw message