mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruv Kumar <dku...@ecs.umass.edu>
Subject Re: Apache Giraph?
Date Mon, 05 Sep 2011 18:08:10 GMT
On Mon, Sep 5, 2011 at 9:02 AM, Jake Mannix <jake.mannix@gmail.com> wrote:

>
>  This is my impression too.  The more I play with Spark, the more it looks
> like
> "the Right Paradigm" for this kind of computation: how many years has I
> been
> complaining that all I've ever wanted from Hadoop (and/or Mahout) is to be
> able
> to say something like:
>
>  vectors = load("hdfs://mydataFile");
>  vectors.map(new Function<Vector, Vector>() {
>                       Vector apply(Vector in) { return in.normailze(1); })
>             .filter(new Predicate<Vector>() {
>                       boolean apply(Vector in) { return
> in.numNonDefaultValues() < 1000; })
>            .reduce(new Function<Pair<Vector, Vector>, Vector>() {
>                       Vector apply(Pair<Vector, Vector> pair) { return
> pair.getFirst().plus(pair.getSecond()); });
>


+1 for advocating side effect free programming!

Twister is pretty interesting too and can model Hadoop jobs in a functional
style:
http://www.iterativemapreduce.org/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message