mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <>
Subject Re: discussion of input conversions
Date Fri, 26 Aug 2011 10:10:07 GMT
On 25 August 2011 00:09, Ted Dunning <> wrote:
> Praneet and I were just talking about a project he is working on to do with
> higher-order learning methods such as boosting and feature sharding.  This
> is all pretty much in the context of classification and possibly clustering.
> The problems are:
> a) mahout doesn't have a general input format for classifiable data (this
> has been discussed recently)
> b) hashed vector representations are not suitable for feature sharding since
> individual features may be redundantly represented in many locations.
> c) mahout doesn't have a reasonable data structure for general data transfer
> (related to -a-)

Re (c),
Could Apache Pig's store/load subsystem be useful here? With possible
side-benefit of making data on the same Hadoop cluster amenable to
both Mahout and Pig-based hackery / analysis / scripting? Code is also
already in the Apache universe, which reduces friction around
licensing, Maven etc.



View raw message