mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: discussion of input conversions
Date Wed, 24 Aug 2011 23:21:23 GMT
GPL as part of our code base is pretty much a non-starter.  You maybe could have come up w/
some workarounds, but see, especially the exception

Also, we have a basic ARFF reader in the integration module already.  It has basic ARFF reading
support.  It would be cool if someone who had more examples and familiarity w/ ARFF were to
take it up a notch.

On Aug 24, 2011, at 3:09 PM, Ted Dunning wrote:

> Praneet and I were just talking about a project he is working on to do with
> higher-order learning methods such as boosting and feature sharding.  This
> is all pretty much in the context of classification and possibly clustering.
> The problems are:
> a) mahout doesn't have a general input format for classifiable data (this
> has been discussed recently)
> b) hashed vector representations are not suitable for feature sharding since
> individual features may be redundantly represented in many locations.
> c) mahout doesn't have a reasonable data structure for general data transfer
> (related to -a-)
> One possible thought is that Mahout could introduce Weka as a dependency.
> The virtues would be:
> 1) Weka has ARFF as a data format and Instance as an object to satisfy (a)
> and (c)
> 2) Weka provides a bunch of simple classifier algorithms which are not
> individually scalable, but might be made to be so by model averaging or
> feature sharding.
> 3) Praneet could finish his project very quickly.
> Any thoughts about this?
> The problems that I see with this include:
> A) Weka is GPL which might slow adoption of Mahout and would certainly
> inhibit direct incorporation of any piece of Weka
> B) Weka appears to have not caught the maven bug which makes it harder to
> add as a dependency without actually distributing the weka jar.
> One possible work-around might be to reverse engineer something like
> Instance and an ARFF reader/writer.

Grant Ingersoll

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message