crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Alternative strategy for incorporating Java 8 lambdas into Crunch
Date Tue, 15 Dec 2015 09:34:11 GMT
Yeah, looking at the two next to each other I'm going for the
collections approach as well. +1

On Tue, Dec 15, 2015 at 2:04 AM, Josh Wills <josh.wills@gmail.com> wrote:
> On Mon, Dec 14, 2015 at 4:15 PM, David Whiting <davidwhiting@gmail.com>
> wrote:
>
>> 1) Not at all, just some leftover working names for stuff.
>>
>> 2) Not for a totally minimal implementation, but some of the features I
>> would like to include would rely on Java 8 things, for example adapting the
>> GroupedTable stuff to use Streams rather than Iterables because of a) the
>> extra expressivity and b) the implied once-only traversal. We could have a
>> filterMap which applies a Function<S, Optional<T>> (my most common use
case
>> for a DoFn instead of a MapFn at the moment). We can also potentially
>> utilise Collectors for collapsing values in reduce-side stuff and finally,
>> it'll make the implementation of it a fair bit easier. The maven overhead
>> is pretty low, so I guess it's just the existence of an extra artifact to
>> consider. The way I see it is that it's a push to make the API feel more
>> like Java streams and be more immediately usable by someone who knows Java
>> streams but not necessarily big data, so the more we can replicate that
>> feel by integrating with other familiar Java 8 features, the better.
>>
>
> Makes sense to me. +1 for a new crunch-lambda module.
>
>
>>
>> On 15 December 2015 at 00:51, Josh Wills <josh.wills@gmail.com> wrote:
>>
>> > I think I lean towards the collections approach, but that's probably
>> > because of my Scrunch experience. Two questions:
>> >
>> > 1) Is mapToTable necessary? I would think map(SFunction, PTableType)
>> would
>> > be distinguishable from map(SFunction, PType) by the compiler in the same
>> > way it is for parallelDo.
>> > 2) Does the collections approach need a separate maven target at all, or
>> > could it just be part of crunch-core as a replacement for the IFn stuff?
>> Or
>> > is there Java 8-only stuff we'll want to add in to its API?
>> >
>> > On Mon, Dec 14, 2015 at 3:13 PM, David Whiting <davw@apache.org> wrote:
>> >
>> > > Ok, so I've implemented a few iterations of this. I went forward with
>> the
>> > > "wrap the functions" method, which seemed to work alright, but finding
>> > good
>> > > names for functions which essentially just wrap functions but which
>> > aren't
>> > > ambiguous in erasure and read nicely was a real challenge. I showed
>> some
>> > > sample code to some of my fellow data engineers and the consensus
>> seemed
>> > to
>> > > be that it was definitely better than anonymous inner classes, but it
>> > still
>> > > felt kind of awkward and strange to use.
>> > >
>> > > So here's a 3rd option: wrap the collection types rather than the
>> > function
>> > > types, and present an API which feels truly Java 8 native whilst still
>> > > being able to dig back to the underlying PCollections (doing pretty
>> much
>> > > what Scrunch does, but with less implicit Scala magic).
>> > >
>> > > Here's a super-minimal proof-of-concept for that:
>> > > https://gist.github.com/DavW/7efe484ea0c00cf6e66b
>> > >
>> > > and a comparison of the two approaches in usage:
>> > > https://gist.github.com/DavW/997a92b31d55c5317fb7
>> > >
>> > >
>> > > On 13 December 2015 at 16:14, Gabriel Reid <gabriel.reid@gmail.com>
>> > wrote:
>> > >
>> > > > This looks very cool. As long as we can keep things compatible with
>> > > > Java 7 using whatever kind of maven voodoo that's necessary, I'm all
>> > > > for it.
>> > > >
>> > > > I'd say no real reason to keep the IFn stuff if this goes in.
>> > > >
>> > > > - Gabriel
>> > > >
>> > > > On Fri, Dec 11, 2015 at 11:18 PM, Josh Wills <josh.wills@gmail.com>
>> > > wrote:
>> > > > > It seems like a net positive over the IFn stuff, so I could make
an
>> > > > > argument for replacing it, but if there's anyone out there in
love
>> > > > w/IFns,
>> > > > > they should speak up now. :)
>> > > > >
>> > > > > J
>> > > > >
>> > > > > On Fri, Dec 11, 2015 at 2:17 PM, David Whiting <davw@apache.org>
>> > > wrote:
>> > > > >
>> > > > >> I *think* you can set language level and target jdk on a
>> per-module
>> > > > basis,
>> > > > >> so it should be relatively easy. I'll experiment at some
point
>> over
>> > > the
>> > > > >> weekend. Would this complement or replace the I*Fn stuff
do you
>> > think?
>> > > > 14.0
>> > > > >> is not yet released, so I guess it's not too late to change
if we
>> > want
>> > > > to.
>> > > > >>
>> > > > >> On 11 December 2015 at 22:57, Josh Wills <josh.wills@gmail.com>
>> > > wrote:
>> > > > >>
>> > > > >> > That's the sexiest thing I've seen in some time. +1
for a lambda
>> > > > module,
>> > > > >> > but how does that work in Maven-fu? Is it like a conditional
>> > compile
>> > > > or
>> > > > >> > something?
>> > > > >> >
>> > > > >> > On Fri, Dec 11, 2015 at 1:20 PM, David Whiting <davw@apache.org
>> >
>> > > > wrote:
>> > > > >> >
>> > > > >> > > Oops, my bad. Here's a Gist:
>> > > > >> > > https://gist.github.com/DavW/e2588e42c45ad8c06038
>> > > > >> > >
>> > > > >> > > On 11 December 2015 at 18:43, Josh Wills <
>> josh.wills@gmail.com>
>> > > > wrote:
>> > > > >> > >
>> > > > >> > > > I think it's kind of awesome, but the attachment
didn't go
>> > > > through-
>> > > > >> PR
>> > > > >> > or
>> > > > >> > > > gist?
>> > > > >> > > > On Fri, Dec 11, 2015 at 7:42 AM David Whiting
<
>> > davw@apache.org>
>> > > > >> wrote:
>> > > > >> > > >
>> > > > >> > > > > While fixing the bug where the IFn version
of mapValues on
>> > > > >> > > PGroupedTable
>> > > > >> > > > > was missing, I got thinking that this
is quite an
>> > inefficient
>> > > > way
>> > > > >> of
>> > > > >> > > > > including support for lambdas and method
references, and
>> it
>> > > > still
>> > > > >> > > didn't
>> > > > >> > > > > actually support quite a few of the features
that would
>> make
>> > > it
>> > > > >> easy
>> > > > >> > to
>> > > > >> > > > > code against.
>> > > > >> > > > >
>> > > > >> > > > > Negative parts of existing lambda implementation:
>> > > > >> > > > > 1) Explosion of already-crowded PCollection,
PTable and
>> > > > >> PGroupedTable
>> > > > >> > > > > interfaces, and having to implement those
methods in all
>> > > > >> > > implementations.
>> > > > >> > > > > 2) Not supporting flatMap to Optional
or Stream types.
>> > > > >> > > > > 3) Not exposing convenient types for
reduce-type
>> operations
>> > > > (Stream
>> > > > >> > > > > instead of Iterable, for example).
>> > > > >> > > > >
>> > > > >> > > > > Something that would solve all three
of these is to build
>> > > lambda
>> > > > >> > > support
>> > > > >> > > > > as a separate artifact (so we can use
all java8 types),
>> and
>> > > > instead
>> > > > >> > of
>> > > > >> > > > the
>> > > > >> > > > > API being directly on the PSomething
interfaces, we just
>> > have
>> > > > >> > > convenient
>> > > > >> > > > > ways to wrap up lambdas into DoFns or
MapFns via
>> > > > >> statically-imported
>> > > > >> > > > > methods.
>> > > > >> > > > >
>> > > > >> > > > > The usage then becomes
>> > > > >> > > > > import static org.apache.crunch.Lambda.*;
>> > > > >> > > > > ...
>> > > > >> > > > > someCollection.parallelDo(flatMap(d ->
someFnOf(d)), pt)
>> > > > >> > > > > ...
>> > > > >> > > > > otherGroupedTable.mapValue(reduce(seq
-> seq.mapToInt(i ->
>> > > > >> i).sum()),
>> > > > >> > > > > ints())
>> > > > >> > > > >
>> > > > >> > > > > Where flatMap and reduce are static methods
on Lambda, and
>> > > > Lambda
>> > > > >> > goes
>> > > > >> > > in
>> > > > >> > > > > it's own artifact (to preserve compatibility
with 6 and 7
>> > for
>> > > > the
>> > > > >> > rest
>> > > > >> > > of
>> > > > >> > > > > Crunch).
>> > > > >> > > > > I've attached a basic proof-of-concept
implementation
>> which
>> > > I've
>> > > > >> > > tested a
>> > > > >> > > > > few things with, and I'm very happy to
sketch out a more
>> > > > >> substantial
>> > > > >> > > > > implementation if people here think it's
a good idea in
>> > > general.
>> > > > >> > > > >
>> > > > >> > > > > Thoughts? Ideas? Suggestions? Please
tell me if this is
>> > crazy.
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>>

Mime
View raw message