crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-538) Add support for Java lambdas to PCollection/PTable methods
Date Mon, 13 Jul 2015 23:21:05 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625554#comment-14625554
] 

Josh Wills commented on CRUNCH-538:
-----------------------------------

[~gabriel.reid] thanks for the comments here. Replying in an inline-y way.

First, for the XYZWithContext functions, I find that most developers who are good at writing
data pipelines make extensive use of counters to track progress and errors in their code.
I think that lambdas that don't allow the developer to have access to the counter/etc. info
are much less useful in practice. The other approach I could get on board with would be something
that looked like what Cloud Dataflow did, where we have two types of lambdas:

1) Lambdas that simply operate on the value directly and return an Iterable, single value,
boolean filter, etc. and
2) A lambda that takes a single FnContext object (or similar naming) that wraps up the current
value to be processed, the counters, the configuration, and the output emitter into a single
interface, which is modeled directly after Cloud Dataflow DoFns.

The advantage of this would be a) less tight coupling w/the MR stuff directly (although that
will always be unavoidable for legacy reasons) and b) we could collapse filterWithContext,
mapWithContext, flatMapWithContext into a single parallelDo-style implementation that could
still be a lambda. Honestly, the more I think about that, the more I like it.

I hear you on the name parameter making debugging easier, I'm amenable to adding it back in
as part of the FnContext change. We should try to encourage our best practices in these API
extensions, and named stages are definitely a best practice.

I hear you on the IFilterFn -> FilterFn stuff, but I'm trying to avoid recompilation for
folks who are happy rolling along on Java 7 etc., so I'd prefer not to do it for this rev.

> Add support for Java lambdas to PCollection/PTable methods
> ----------------------------------------------------------
>
>                 Key: CRUNCH-538
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-538
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.12.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.13.0
>
>         Attachments: CRUNCH-538.patch
>
>
> Java 8 is more-or-less mainstream at this point, and lambdas are one of its best new
features. Let's add lambda-friendly interfaces and methods to the PCollection/PTable classes
modeled after the methods defined for Scrunch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message