mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <dan...@danbri.org>
Subject Re: discussion of input conversions
Date Fri, 26 Aug 2011 10:10:07 GMT
On 25 August 2011 00:09, Ted Dunning <ted.dunning@gmail.com> wrote:
> Praneet and I were just talking about a project he is working on to do with
> higher-order learning methods such as boosting and feature sharding.  This
> is all pretty much in the context of classification and possibly clustering.
>
> The problems are:
>
> a) mahout doesn't have a general input format for classifiable data (this
> has been discussed recently)
>
> b) hashed vector representations are not suitable for feature sharding since
> individual features may be redundantly represented in many locations.
>
> c) mahout doesn't have a reasonable data structure for general data transfer
> (related to -a-)

Re (c),
Could Apache Pig's store/load subsystem be useful here? With possible
side-benefit of making data on the same Hadoop cluster amenable to
both Mahout and Pig-based hackery / analysis / scripting? Code is also
already in the Apache universe, which reduces friction around
licensing, Maven etc.

http://pig.apache.org/docs/r0.9.0/func.html#load-store-functions
 http://pig.apache.org/docs/r0.9.0/func.html#pigdump
 http://pig.apache.org/docs/r0.9.0/func.html#pigstorage

cheers,

Dan

Mime
View raw message