mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahout - Pig Hackday
Date Wed, 02 May 2012 18:13:54 GMT
On Wed, May 2, 2012 at 11:06 AM, Timothy Potter <thelabdude@gmail.com>wrote:

> We're really keen on Ted's pig-vector project
> (https://github.com/tdunning/pig-vector) as we're building a number of
> classifiers on Mahout's SGD framework, with the bulk of our data being
> in Cassandra processed almost entirely with Pig. We'd love to hear
> about any planned features for the pig-vector project we can help out
> on. Any similar Pig-Mahout projects we should know about?
>

The huge problem with pig-vector is that dependency on elephant bird makes
it really almost impossible to build.  Elephant bird has obscure
dependencies on things like yaml-beans.  That is a problem because the
yaml-beans maintainer has a forceful way of expressing his distaste for all
things to do with Maven and thus refuses to publish any artifacts in
standard ways.  Actually, the maintainer has a rather forceful manner that
he applies to all interactions as far as I can tell.

On the other hand, the necessary capabilities that pig-vector needs from
Elephant bird are quite minor and could probably be reasonably extract.  I
am under-water, however, and thus cannot finish that right away.  I can and
will assist anybody who has the necessary time and enthusiasm.  This might
make a very nice pig day effort.


> In general, we're reaching out today to see who else in the community
> is interested in better Pig / Mahout integration and what types of
> challenges they're facing? Any cool UDFs you'd like to share?
>

Praneet at UCI (praneetmhatre@gmail.com) has been doing some interesting
work here to do with feature sharding in pig.  Perhaps he can speak up.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message