beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismaël Mejía <ieme...@gmail.com>
Subject Re: [thread fork] Apache Beam & Google Cloud Dataflow
Date Fri, 17 Jun 2016 07:46:13 GMT
Hello Frances,

Thanks for clearing this out. I hope you (google) can make somehow this
official (maybe in the FAQ too), the effect that users can 'experiment' to
move their code bases into Beam (without support until the official
release). Anyway it is great to know that this works (at least from a
non-supported but technical feasible point of view).

Ismaël

On Fri, Jun 17, 2016 at 9:14 AM, Jean-Baptiste Onofré <jb@nanthrax.net>
wrote:

> Hi Frances,
>
> thanks for the details (and I like your Google hat ;)). I was more talking
> "technically speaking" ;)
>
> Regards
> JB
>
>
> On 06/17/2016 07:21 AM, Frances Perry wrote:
>
>> With my Google employee hat on, I'd like to soften that claim a little ;-)
>>
>> Currently, the Beam SDK runs again Google Cloud Dataflow. But since Beam
>> isn't itself ready for prime time yet, Google doesn't officially provide
>> support for running Beam on Cloud Dataflow right now, and Google Cloud
>> Dataflow customers should still use the original Dataflow Java SDK.
>>
>> But I, for one, am looking forward to this evolving over the next few
>> months as Beam stabilizes ;-D
>>
>>
>> On Thu, Jun 16, 2016 at 9:50 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
>> wrote:
>>
>> Hi,
>>>
>>> as soon as you use the Beam dataflow runner, it should work smoothly.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 06/16/2016 10:05 PM, Ismaël Mejía wrote:
>>>
>>> Hello,
>>>>
>>>> One additional comment / question. I just noticed that Beam users
>>>> already
>>>> can write their Beam Pipelines and execute them in the google dataflow
>>>> runner.
>>>>
>>>> I just did the test today and I was thrilled to confirm that it worked
>>>> (as
>>>> JB told me).
>>>>
>>>> You can look at the SDK version in the image:
>>>> https://imgur.com/k9HnLnv
>>>>
>>>> The question is, is this some kind of beta, or is this going to be
>>>> supported during the transition (before the formal release 1.0) ? I ask
>>>> this because I suppose many current google users hesitate to move to
>>>> Beam
>>>> for the moment because they don't know that they can already run their
>>>> pipelines in the Google Cloud Dataflow service. I think this is a good
>>>> idea
>>>> to encourage users to move their data processing pipelines into the Beam
>>>> version.
>>>>
>>>> Regards,
>>>> Ismaël
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jun 15, 2016 at 11:21 PM, James Malone <
>>>> jamesmalone@google.com.invalid> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>>>
>>>>> This is a thread fork from the email thread titled '[dev] Announcing
>>>>> 0.1.0-incubating release'.
>>>>>
>>>>> In that thread, Amir posed a good question:
>>>>>
>>>>>      Why is still "Google Cloud Dataflow" included in the Beam release
>>>>> if
>>>>> Beam is indeed
>>>>>      an evolution (super-set?) of "Google Cloud Dataflow".Thanks
>>>>> +regards,Amir-
>>>>>
>>>>> Many parts of Apache Beam are based on work from Google Cloud Dataflow,
>>>>> including the Dataflow (now Beam) model, SDKs (Java and Python), and
>>>>> some
>>>>> of the runners. This work was combined with awesome contributions from
>>>>> other groups (data Artisans/Apache Flink, Cloudera & PayPal/Apache
>>>>> Spark,
>>>>> etc.) to form the basis for Apache Beam[1]. Originally, the Cloud
>>>>> Dataflow
>>>>> SDK included machinery so Dataflow pipelines could be executed on
>>>>> Google
>>>>> Cloud Dataflow.
>>>>>
>>>>> An important part of Apache Beam is the ability to execute Beam
>>>>> pipelines
>>>>> on many runners (see the compatibility matrix[2] for full details and
>>>>> support.) The Beam project includes a runner for Google Cloud Dataflow,
>>>>> along with others, such as runners for Apache Flink and Apache Spark.
>>>>> We're
>>>>> also focused (and excited!) to support and grow new runners. As a
>>>>> seperate
>>>>> runner, the work for supporting execution on Cloud Dataflow can be
>>>>> separated into the runner from the larger Apache Beam effort.
>>>>>
>>>>> So, to summarize:
>>>>>
>>>>> Beam is based on work from Google Cloud Dataflow so it's definitely an
>>>>> evolution. Additionally, Beam includes a runner (one of many) for
>>>>> Google's
>>>>> Cloud Dataflow service.
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> James
>>>>>
>>>>> [1]: http://wiki.apache.org/incubator/BeamProposal
>>>>> [2]: http://beam.incubator.apache.org/capability-matrix
>>>>>
>>>>>
>>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message