crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Whiting <d...@apache.org>
Subject Re: Alternative strategy for incorporating Java 8 lambdas into Crunch
Date Mon, 14 Dec 2015 23:13:25 GMT
Ok, so I've implemented a few iterations of this. I went forward with the
"wrap the functions" method, which seemed to work alright, but finding good
names for functions which essentially just wrap functions but which aren't
ambiguous in erasure and read nicely was a real challenge. I showed some
sample code to some of my fellow data engineers and the consensus seemed to
be that it was definitely better than anonymous inner classes, but it still
felt kind of awkward and strange to use.

So here's a 3rd option: wrap the collection types rather than the function
types, and present an API which feels truly Java 8 native whilst still
being able to dig back to the underlying PCollections (doing pretty much
what Scrunch does, but with less implicit Scala magic).

Here's a super-minimal proof-of-concept for that:
https://gist.github.com/DavW/7efe484ea0c00cf6e66b

and a comparison of the two approaches in usage:
https://gist.github.com/DavW/997a92b31d55c5317fb7


On 13 December 2015 at 16:14, Gabriel Reid <gabriel.reid@gmail.com> wrote:

> This looks very cool. As long as we can keep things compatible with
> Java 7 using whatever kind of maven voodoo that's necessary, I'm all
> for it.
>
> I'd say no real reason to keep the IFn stuff if this goes in.
>
> - Gabriel
>
> On Fri, Dec 11, 2015 at 11:18 PM, Josh Wills <josh.wills@gmail.com> wrote:
> > It seems like a net positive over the IFn stuff, so I could make an
> > argument for replacing it, but if there's anyone out there in love
> w/IFns,
> > they should speak up now. :)
> >
> > J
> >
> > On Fri, Dec 11, 2015 at 2:17 PM, David Whiting <davw@apache.org> wrote:
> >
> >> I *think* you can set language level and target jdk on a per-module
> basis,
> >> so it should be relatively easy. I'll experiment at some point over the
> >> weekend. Would this complement or replace the I*Fn stuff do you think?
> 14.0
> >> is not yet released, so I guess it's not too late to change if we want
> to.
> >>
> >> On 11 December 2015 at 22:57, Josh Wills <josh.wills@gmail.com> wrote:
> >>
> >> > That's the sexiest thing I've seen in some time. +1 for a lambda
> module,
> >> > but how does that work in Maven-fu? Is it like a conditional compile
> or
> >> > something?
> >> >
> >> > On Fri, Dec 11, 2015 at 1:20 PM, David Whiting <davw@apache.org>
> wrote:
> >> >
> >> > > Oops, my bad. Here's a Gist:
> >> > > https://gist.github.com/DavW/e2588e42c45ad8c06038
> >> > >
> >> > > On 11 December 2015 at 18:43, Josh Wills <josh.wills@gmail.com>
> wrote:
> >> > >
> >> > > > I think it's kind of awesome, but the attachment didn't go
> through-
> >> PR
> >> > or
> >> > > > gist?
> >> > > > On Fri, Dec 11, 2015 at 7:42 AM David Whiting <davw@apache.org>
> >> wrote:
> >> > > >
> >> > > > > While fixing the bug where the IFn version of mapValues
on
> >> > > PGroupedTable
> >> > > > > was missing, I got thinking that this is quite an inefficient
> way
> >> of
> >> > > > > including support for lambdas and method references, and
it
> still
> >> > > didn't
> >> > > > > actually support quite a few of the features that would
make it
> >> easy
> >> > to
> >> > > > > code against.
> >> > > > >
> >> > > > > Negative parts of existing lambda implementation:
> >> > > > > 1) Explosion of already-crowded PCollection, PTable and
> >> PGroupedTable
> >> > > > > interfaces, and having to implement those methods in all
> >> > > implementations.
> >> > > > > 2) Not supporting flatMap to Optional or Stream types.
> >> > > > > 3) Not exposing convenient types for reduce-type operations
> (Stream
> >> > > > > instead of Iterable, for example).
> >> > > > >
> >> > > > > Something that would solve all three of these is to build
lambda
> >> > > support
> >> > > > > as a separate artifact (so we can use all java8 types),
and
> instead
> >> > of
> >> > > > the
> >> > > > > API being directly on the PSomething interfaces, we just
have
> >> > > convenient
> >> > > > > ways to wrap up lambdas into DoFns or MapFns via
> >> statically-imported
> >> > > > > methods.
> >> > > > >
> >> > > > > The usage then becomes
> >> > > > > import static org.apache.crunch.Lambda.*;
> >> > > > > ...
> >> > > > > someCollection.parallelDo(flatMap(d -> someFnOf(d)),
pt)
> >> > > > > ...
> >> > > > > otherGroupedTable.mapValue(reduce(seq -> seq.mapToInt(i
->
> >> i).sum()),
> >> > > > > ints())
> >> > > > >
> >> > > > > Where flatMap and reduce are static methods on Lambda, and
> Lambda
> >> > goes
> >> > > in
> >> > > > > it's own artifact (to preserve compatibility with 6 and
7 for
> the
> >> > rest
> >> > > of
> >> > > > > Crunch).
> >> > > > > I've attached a basic proof-of-concept implementation which
I've
> >> > > tested a
> >> > > > > few things with, and I'm very happy to sketch out a more
> >> substantial
> >> > > > > implementation if people here think it's a good idea in
general.
> >> > > > >
> >> > > > > Thoughts? Ideas? Suggestions? Please tell me if this is
crazy.
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message