crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Google Cloud Dataflow?
Date Sun, 25 Jan 2015 19:01:08 GMT
:)

I never want anyone to have to rewrite code in order to pick up and move it
to a different execution engine. At the very least, we should write a
wrapper that lets you run existing o.a.c.DoFn subclasses in Dataflow
pipelines, and maybe even a DataflowPipeline implementation to port
existing Crunch pipelines over to Dataflow once it's easier to bring Hadoop
Input/OutputFormats over.

I have some thoughts on why I think Dataflow is interesting, esp. to
developers who are familiar with Crunch/Spark/Scalding, but I'll send them
out in a different email b/c they're getting kind of long.

J

On Sun, Jan 25, 2015 at 10:02 AM, Danny Morgan <unluckyboy@hotmail.com>
wrote:

> So should I start porting my crunch code over to the Cloud Dataflow sdk?
>
> Danny
>

Mime
View raw message