mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: discussion of input conversions
Date Wed, 24 Aug 2011 23:21:23 GMT
GPL as part of our code base is pretty much a non-starter.  You maybe could have come up w/
some workarounds, but see http://www.apache.org/legal/3party.html, especially the exception
options.

Also, we have a basic ARFF reader in the integration module already.  It has basic ARFF reading
support.  It would be cool if someone who had more examples and familiarity w/ ARFF were to
take it up a notch.

On Aug 24, 2011, at 3:09 PM, Ted Dunning wrote:

> Praneet and I were just talking about a project he is working on to do with
> higher-order learning methods such as boosting and feature sharding.  This
> is all pretty much in the context of classification and possibly clustering.
> 
> The problems are:
> 
> a) mahout doesn't have a general input format for classifiable data (this
> has been discussed recently)
> 
> b) hashed vector representations are not suitable for feature sharding since
> individual features may be redundantly represented in many locations.
> 
> c) mahout doesn't have a reasonable data structure for general data transfer
> (related to -a-)
> 
> One possible thought is that Mahout could introduce Weka as a dependency.
> 
> The virtues would be:
> 
> 1) Weka has ARFF as a data format and Instance as an object to satisfy (a)
> and (c)
> 
> 2) Weka provides a bunch of simple classifier algorithms which are not
> individually scalable, but might be made to be so by model averaging or
> feature sharding.
> 
> 3) Praneet could finish his project very quickly.
> 
> Any thoughts about this?
> 
> The problems that I see with this include:
> 
> A) Weka is GPL which might slow adoption of Mahout and would certainly
> inhibit direct incorporation of any piece of Weka
> 
> B) Weka appears to have not caught the maven bug which makes it harder to
> add as a dependency without actually distributing the weka jar.
> 
> One possible work-around might be to reverse engineer something like
> Instance and an ARFF reader/writer.

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message