crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Whiting <>
Subject Alternative strategy for incorporating Java 8 lambdas into Crunch
Date Fri, 11 Dec 2015 15:42:33 GMT
While fixing the bug where the IFn version of mapValues on PGroupedTable
was missing, I got thinking that this is quite an inefficient way of
including support for lambdas and method references, and it still didn't
actually support quite a few of the features that would make it easy to
code against.

Negative parts of existing lambda implementation:
1) Explosion of already-crowded PCollection, PTable and PGroupedTable
interfaces, and having to implement those methods in all implementations.
2) Not supporting flatMap to Optional or Stream types.
3) Not exposing convenient types for reduce-type operations (Stream instead
of Iterable, for example).

Something that would solve all three of these is to build lambda support as
a separate artifact (so we can use all java8 types), and instead of the API
being directly on the PSomething interfaces, we just have convenient ways
to wrap up lambdas into DoFns or MapFns via statically-imported methods.

The usage then becomes
import static org.apache.crunch.Lambda.*;
someCollection.parallelDo(flatMap(d -> someFnOf(d)), pt)
otherGroupedTable.mapValue(reduce(seq -> seq.mapToInt(i -> i).sum()),

Where flatMap and reduce are static methods on Lambda, and Lambda goes in
it's own artifact (to preserve compatibility with 6 and 7 for the rest of
I've attached a basic proof-of-concept implementation which I've tested a
few things with, and I'm very happy to sketch out a more substantial
implementation if people here think it's a good idea in general.

Thoughts? Ideas? Suggestions? Please tell me if this is crazy.

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message