crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Alternative strategy for incorporating Java 8 lambdas into Crunch
Date Mon, 14 Dec 2015 23:51:15 GMT
I think I lean towards the collections approach, but that's probably
because of my Scrunch experience. Two questions:

1) Is mapToTable necessary? I would think map(SFunction, PTableType) would
be distinguishable from map(SFunction, PType) by the compiler in the same
way it is for parallelDo.
2) Does the collections approach need a separate maven target at all, or
could it just be part of crunch-core as a replacement for the IFn stuff? Or
is there Java 8-only stuff we'll want to add in to its API?

On Mon, Dec 14, 2015 at 3:13 PM, David Whiting <davw@apache.org> wrote:

> Ok, so I've implemented a few iterations of this. I went forward with the
> "wrap the functions" method, which seemed to work alright, but finding good
> names for functions which essentially just wrap functions but which aren't
> ambiguous in erasure and read nicely was a real challenge. I showed some
> sample code to some of my fellow data engineers and the consensus seemed to
> be that it was definitely better than anonymous inner classes, but it still
> felt kind of awkward and strange to use.
>
> So here's a 3rd option: wrap the collection types rather than the function
> types, and present an API which feels truly Java 8 native whilst still
> being able to dig back to the underlying PCollections (doing pretty much
> what Scrunch does, but with less implicit Scala magic).
>
> Here's a super-minimal proof-of-concept for that:
> https://gist.github.com/DavW/7efe484ea0c00cf6e66b
>
> and a comparison of the two approaches in usage:
> https://gist.github.com/DavW/997a92b31d55c5317fb7
>
>
> On 13 December 2015 at 16:14, Gabriel Reid <gabriel.reid@gmail.com> wrote:
>
> > This looks very cool. As long as we can keep things compatible with
> > Java 7 using whatever kind of maven voodoo that's necessary, I'm all
> > for it.
> >
> > I'd say no real reason to keep the IFn stuff if this goes in.
> >
> > - Gabriel
> >
> > On Fri, Dec 11, 2015 at 11:18 PM, Josh Wills <josh.wills@gmail.com>
> wrote:
> > > It seems like a net positive over the IFn stuff, so I could make an
> > > argument for replacing it, but if there's anyone out there in love
> > w/IFns,
> > > they should speak up now. :)
> > >
> > > J
> > >
> > > On Fri, Dec 11, 2015 at 2:17 PM, David Whiting <davw@apache.org>
> wrote:
> > >
> > >> I *think* you can set language level and target jdk on a per-module
> > basis,
> > >> so it should be relatively easy. I'll experiment at some point over
> the
> > >> weekend. Would this complement or replace the I*Fn stuff do you think?
> > 14.0
> > >> is not yet released, so I guess it's not too late to change if we want
> > to.
> > >>
> > >> On 11 December 2015 at 22:57, Josh Wills <josh.wills@gmail.com>
> wrote:
> > >>
> > >> > That's the sexiest thing I've seen in some time. +1 for a lambda
> > module,
> > >> > but how does that work in Maven-fu? Is it like a conditional compile
> > or
> > >> > something?
> > >> >
> > >> > On Fri, Dec 11, 2015 at 1:20 PM, David Whiting <davw@apache.org>
> > wrote:
> > >> >
> > >> > > Oops, my bad. Here's a Gist:
> > >> > > https://gist.github.com/DavW/e2588e42c45ad8c06038
> > >> > >
> > >> > > On 11 December 2015 at 18:43, Josh Wills <josh.wills@gmail.com>
> > wrote:
> > >> > >
> > >> > > > I think it's kind of awesome, but the attachment didn't
go
> > through-
> > >> PR
> > >> > or
> > >> > > > gist?
> > >> > > > On Fri, Dec 11, 2015 at 7:42 AM David Whiting <davw@apache.org>
> > >> wrote:
> > >> > > >
> > >> > > > > While fixing the bug where the IFn version of mapValues
on
> > >> > > PGroupedTable
> > >> > > > > was missing, I got thinking that this is quite an inefficient
> > way
> > >> of
> > >> > > > > including support for lambdas and method references,
and it
> > still
> > >> > > didn't
> > >> > > > > actually support quite a few of the features that would
make
> it
> > >> easy
> > >> > to
> > >> > > > > code against.
> > >> > > > >
> > >> > > > > Negative parts of existing lambda implementation:
> > >> > > > > 1) Explosion of already-crowded PCollection, PTable
and
> > >> PGroupedTable
> > >> > > > > interfaces, and having to implement those methods in
all
> > >> > > implementations.
> > >> > > > > 2) Not supporting flatMap to Optional or Stream types.
> > >> > > > > 3) Not exposing convenient types for reduce-type operations
> > (Stream
> > >> > > > > instead of Iterable, for example).
> > >> > > > >
> > >> > > > > Something that would solve all three of these is to
build
> lambda
> > >> > > support
> > >> > > > > as a separate artifact (so we can use all java8 types),
and
> > instead
> > >> > of
> > >> > > > the
> > >> > > > > API being directly on the PSomething interfaces, we
just have
> > >> > > convenient
> > >> > > > > ways to wrap up lambdas into DoFns or MapFns via
> > >> statically-imported
> > >> > > > > methods.
> > >> > > > >
> > >> > > > > The usage then becomes
> > >> > > > > import static org.apache.crunch.Lambda.*;
> > >> > > > > ...
> > >> > > > > someCollection.parallelDo(flatMap(d -> someFnOf(d)),
pt)
> > >> > > > > ...
> > >> > > > > otherGroupedTable.mapValue(reduce(seq -> seq.mapToInt(i
->
> > >> i).sum()),
> > >> > > > > ints())
> > >> > > > >
> > >> > > > > Where flatMap and reduce are static methods on Lambda,
and
> > Lambda
> > >> > goes
> > >> > > in
> > >> > > > > it's own artifact (to preserve compatibility with 6
and 7 for
> > the
> > >> > rest
> > >> > > of
> > >> > > > > Crunch).
> > >> > > > > I've attached a basic proof-of-concept implementation
which
> I've
> > >> > > tested a
> > >> > > > > few things with, and I'm very happy to sketch out a
more
> > >> substantial
> > >> > > > > implementation if people here think it's a good idea
in
> general.
> > >> > > > >
> > >> > > > > Thoughts? Ideas? Suggestions? Please tell me if this
is crazy.
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message