mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Classifying general Attribute-Relation data using Mahout
Date Tue, 09 Feb 2010 17:43:49 GMT
Oops. The ARFF Driver writes only vectors not the tab separated format the
Bayes Classifier reads.  I will try to add that as a flag

@Grant: For batch classification,yes we can go with vectors, But I dont see
how we can classify documents on the fly if the dictionary cant fit in the
memory. Maybe, randomizers can help. We will have to wait for that.

@Ted. Waiting to pounce upon the randomizers :)


Robin

On Tue, Feb 9, 2010 at 9:08 PM, Grant Ingersoll <gsingers@apache.org> wrote:

>
> On Feb 8, 2010, at 7:54 AM, Martin H├Ąger wrote:
>
> > Hi,
> >
> > We're experimenting a bit with Weka and Mahout. Our input data is a
> > relation in ARFF format (see attached data.training.arff), and we'd
> > like to classify it using Mahout. However, it seems (to us, at first)
> > that the Mahout classifier.bayes.interfaces.Algorithm interface is
> > centered around documents of text, and not general attribute data.
> > Thus, running the classifier causes our ARFF data to be interpreted as
> > a document of words, with not very useful results (see attached
> > mahout.log).
>
> I think we still need to get our Bayes stuff to run off of Vectors instead
> of text, then it should be easy to go from ARFF to Vector format and then
> run all of the Mahout tools.
>
> -Grant

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message