From dev-return-12172-archive-asf-public=cust-asf.ponee.io@beam.apache.org Thu Sep 13 14:39:41 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3DDC2180649 for ; Thu, 13 Sep 2018 14:39:40 +0200 (CEST) Received: (qmail 68809 invoked by uid 500); 13 Sep 2018 12:39:39 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 68793 invoked by uid 99); 13 Sep 2018 12:39:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2018 12:39:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1FD4A1A246D for ; Thu, 13 Sep 2018 12:39:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -13.611 X-Spam-Level: X-Spam-Status: No, score=-13.611 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id tX0VPDT0gcbY for ; Thu, 13 Sep 2018 12:39:35 +0000 (UTC) Received: from mail-vk1-f177.google.com (mail-vk1-f177.google.com [209.85.221.177]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 73D835F3EE for ; Thu, 13 Sep 2018 12:39:34 +0000 (UTC) Received: by mail-vk1-f177.google.com with SMTP id l143-v6so762323vke.1 for ; Thu, 13 Sep 2018 05:39:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=7MAtTDLLEX4iMba8AZA3EF9WX0DQ03U5ioxbtkAuiJk=; b=Ww6r3qa8mY9m7X35YHvvhJuColtsxymUfDSQP13Eaam/iYxsOuOXcES6ZzxYTGgCh1 gTMbwXTADYRF3csgGFLG8JiBSJ6GokJ/pPsq9eWjh15PS4VMu3TolhLkKTfrGuGQYbo6 EZIAStf4ZVXgvb+ElaK6r5S4xN0zTcJA1K7pmCMH3NPPVkNtQgpUkyuHgcTLbIWX9snp SzY4reNykiLGuBJ1cDt8IHHUTdLC0ZZn6KnJbu98Jmfa7UZJlIzmBEJpRDV3dKOgIpKi Gh8X1HuqvNJd4ve+yW1mco0dkLoy5TB1TLUGiLqeFJ9HWO/IRIZDVKuKTRUuAB44wK3E jLOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=7MAtTDLLEX4iMba8AZA3EF9WX0DQ03U5ioxbtkAuiJk=; b=O0VZ4W2a+kKT3cH+i1V0sgyuxg5FQ3hbeUpuUxPzJa1qShaXqNAIzhdFPsitw16Kur 25TvEmmOtvZgCTSzAaee9/M9D2DuNrL5IADh4S+OpOMnJAlmpi2diuXIc6ByONmo/4Du UxyI0Fwr3F7xzktw76V+0KrHLa8h0y2UjZ3frBV0EPEiL6taLp51hQitW/aex9/9KDsC iCpYxq7zwHWA4Isf/Rauqq8X+F7YSLqaYIoi2l1FrlSgCltrqi57Ict6pmP/oHXk/unW 3d6lFrxlQdVZn7glTSnDftHNjIqKToJ5b2tpJyNvEYYCDhfP0G9Rjxj8hN3AkkaAyc2O DBOA== X-Gm-Message-State: APzg51BxLp4AyduhJvB1XaGb4Ae4a5Bi4T8rOd0rU4AUnA2/BIBoCoFS mlIg3Yw4HUo49KxsV4stPU5c9OS6bKGvMqEW69fQVNLUcHNZTg== X-Google-Smtp-Source: ANB0VdY6mlpcUrAImtOASTUiSZGuWZdd9Zd1wSU7w8aDQ69WXv6AZnAGoNNsX+bfT4b2ItXfWoi1DjpEcP4CxCLehGo= X-Received: by 2002:a1f:21d4:: with SMTP id h203-v6mr1424575vkh.64.1536842366033; Thu, 13 Sep 2018 05:39:26 -0700 (PDT) MIME-Version: 1.0 References: <5d5ab126-c296-c7e5-d0d4-f83048c12ed2@apache.org> In-Reply-To: From: Robert Bradshaw Date: Thu, 13 Sep 2018 14:39:14 +0200 Message-ID: Subject: Re: [Discuss] Upgrade story for Beam's execution engines To: dev@beam.apache.org Content-Type: multipart/alternative; boundary="0000000000004e2bf70575bfff44" --0000000000004e2bf70575bfff44 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The ideal long-term solution is, as Romain mentions, pushing the runner-specific code up to be maintained by each runner with a stable API to use to talk to Beam. Unfortunately, I think we're still a long way from having this Stable API, or having the clout for non-beam-developers to maintain these bindings externally (though hopefully we'll get there). In the short term, we're stuck with either hurting users that want to stick with Flink 1.5, hurting users that want to upgrade to Flink 1.6, or supporting both. Is Beam's interaction with Flink such that we can't simply have separate targets linking the same Beam code against one or the other? (I.e. are code changes needed?) If so, we'll probably need a flink-runner-1.5 module, a flink-runner-1.6, and a flink-runner-common module. Or we hope that all users are happy with 1.5 until a certain point in time when they all want to simultaneously jump to 1.6 and Beam at the same time. Maybe that's enough in the short term, but longer term we need a more sustainable solution. On Thu, Sep 13, 2018 at 7:13 AM Romain Manni-Bucau wrote: > Hi guys, > > Isnt the issue "only" that beam has this code instead of engines? > > Assuming beam runner facing api is stable - which must be the case anyway > - and that each engine has its integration (flink-beam instead of > beam-runners-flink), then this issue disappears by construction. > > It also has the advantage to have a better maintenance. > > Side note: this is what happent which arquillian, originally the communit= y > did all adapters impl then each vendor took it back in house to make it > better. > > Any way to work in that direction maybe? > > Le jeu. 13 sept. 2018 00:49, Thomas Weise a =C3=A9crit : > >> The main problem here is that users are forced to upgrade infrastructure >> to obtain new features in Beam, even when those features actually don't >> require such changes. As an example, another update to Flink 1.6.0 was >> proposed (without supporting new functionality in Beam) and we already k= now >> that it breaks compatibility (again). >> >> I think that upgrading to a Flink X.Y.0 version isn't a good idea to >> start with. But besides that, if we want to grow adoption, then we need = to >> focus on stability and delivering improvements to Beam without disruptin= g >> users. >> >> In the specific case, ideally the surface of Flink would be backward >> compatible, allowing us to stick to a minimum version and be able to sub= mit >> pipelines to Flink endpoints of higher versions. Some work in that >> direction is underway (like versioning the REST API). FYI, lowest common >> version is what most projects that depend on Hadoop 2.x follow. >> >> Since Beam with Flink 1.5.x client won't talk to Flink 1.6 and there are >> code changes required to make it compile, we would need to come up with = a >> more involved strategy to support multiple Flink versions. Till then, I >> would prefer we favor existing users over short lived experiments, which >> would mean stick with 1.5.x and not support 1.6.0. >> >> Thanks, >> Thomas >> >> >> On Wed, Sep 12, 2018 at 1:15 PM Lukasz Cwik wrote: >> >>> As others have already suggested, I also believe LTS releases is the >>> best we can do as a community right now until portability allows us to >>> decouple what a user writes with and how it runs (the SDK and the SDK >>> environment) from the runner (job service + shared common runner libs + >>> Flink/Spark/Dataflow/Apex/Samza/...). >>> >>> Dataflow would be highly invested in having the appropriate tooling >>> within Apache Beam to support multiple SDK versions against a runner. T= his >>> in turn would allow people to use any SDK with any runner and as Robert= had >>> mentioned, certain optimizations and features would be disabled dependi= ng >>> on the capabilities of the runner and the capabilities of the SDK. >>> >>> >>> >>> On Wed, Sep 12, 2018 at 6:38 AM Robert Bradshaw >>> wrote: >>> >>>> The target audience is people who want to use the latest Beam but do >>>> not want to use the latest version of the runner, right? >>>> >>>> I think this will be somewhat (though not entirely) addressed by Beam >>>> LTS releases, where those not wanting to upgrade the runner at least h= ave a >>>> well-supported version of Beam. In the long term, we have the division >>>> >>>> Runner <-> BeamRunnerSpecificCode <-> CommonBeamRunnerLibs <-> SDK= . >>>> >>>> (which applies to the job submission as well as execution). >>>> >>>> Insomuch as the BeamRunnerSpecificCode uses the public APIs of the >>>> runner, hopefully upgrading the runner for minor versions should be a >>>> no-op, and we can target the lowest version of the runner that makes s= ense, >>>> allowing the user to link against higher versions at his or her discre= tion. >>>> We should provide built targets that allow this. For major versions, i= t may >>>> make sense to have two distinct BeamRunnerSpecificCode libraries (whic= h may >>>> or may not share some common code). I hope these wrappers are not too >>>> thick. >>>> >>>> There is a tight coupling at the BeamRunnerSpecificCode <-> >>>> CommonBeamRunnerLibs layer, but hopefully the bulk of the code lives o= n the >>>> right hand side and can be updated as needed independent of the runner= . >>>> There may be code of the form "if the runner supports X, do this fast = path, >>>> otherwise, do this slow path (or reject the pipeline). >>>> >>>> I hope the CommonBeamRunnerLibs <-> SDK coupling is fairly loose, to >>>> the point that one could use SDKs from different versions of Beam (or = even >>>> developed outside of Beam) with an older/newer runner. We may need to = add >>>> versioning to the Fn/Runner/Job API itself to support this. Right now = of >>>> course we're still in a pre-1.0, rapid-development phase wrt this API. >>>> >>>> >>>> >>>> >>>> On Wed, Sep 12, 2018 at 2:10 PM Etienne Chauchot >>>> wrote: >>>> >>>>> Hi Max, >>>>> >>>>> I totally agree with your points especially the users priorities >>>>> (stick to the already working version) , and the need to leverage imp= ortant >>>>> new features. It is indeed a difficult balance to find . >>>>> >>>>> I can talk for a part I know: for the Spark runner, the aim was to >>>>> support Dataset native spark API (in place of RDD). For that we neede= d to >>>>> upgrade to spark 2.x (and we will probably leverage Beam Row as well)= . >>>>> But such an upgrade is a good amount of work which makes it difficult >>>>> to commit on a schedule such as "if there is a major new feature on a= n >>>>> execution engine that we want to leverage, then the upgrade in Beam w= ill be >>>>> done within x months". >>>>> >>>>> Regarding your point on portability : decoupling SDK from runner with >>>>> runner harness and SDK harness might make pipeline authors work easy >>>>> regarding pipeline maintenance. But, still, if we upgrade runner libs= , then >>>>> the users might have their runner harness not work with their engine >>>>> version. >>>>> If such SDK/runner decoupling is 100% functional, then we could >>>>> imaging having multiple runner harnesses shipping different versions = of the >>>>> runner libs to solve this problem. >>>>> But we would need to support more than one version of the runner libs= . >>>>> We chose not to do this on spark runner. >>>>> >>>>> WDYT ? >>>>> >>>>> Best >>>>> Etienne >>>>> >>>>> >>>>> Le mardi 11 septembre 2018 =C3=A0 15:42 +0200, Maximilian Michels a = =C3=A9crit : >>>>> >>>>> Hi Beamers, >>>>> >>>>> >>>>> In the light of the discussion about Beam LTS releases, I'd like to k= ick >>>>> >>>>> off a thread about how often we upgrade the execution engine of each >>>>> >>>>> Runner. By upgrade, I mean major/minor versions which typically break >>>>> >>>>> the binary compatibility of Beam pipelines. >>>>> >>>>> >>>>> For the Flink Runner, we try to track the latest stable version. Some >>>>> >>>>> users reported that this can be problematic, as it requires them to >>>>> >>>>> potentially upgrade their Flink cluster with a new version of Beam. >>>>> >>>>> >>>>> From a developer's perspective, it makes sense to migrate as early a= s >>>>> >>>>> possible to the newest version of the execution engine, e.g. to lever= age >>>>> >>>>> the newest features. From a user's perspective, you don't care about = the >>>>> >>>>> latest features if your use case still works with Beam. >>>>> >>>>> >>>>> We have to please both parties. So I'd suggest to upgrade the executi= on >>>>> >>>>> engine whenever necessary (e.g. critical new features, end of life of >>>>> >>>>> current version). On the other hand, the upcoming Beam LTS releases w= ill >>>>> >>>>> contain a longer-supported version. >>>>> >>>>> >>>>> Maybe we don't need to discuss much about this but I wanted to hear w= hat >>>>> >>>>> the community has to say about it. Particularly, I'd be interested in >>>>> >>>>> how the other Runner authors intend to do it. >>>>> >>>>> >>>>> As far as I understand, with the portability being stable, we could >>>>> >>>>> theoretically upgrade the SDK without upgrading the runtime component= s. >>>>> >>>>> That would allow us to defer the upgrade for a longer time. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Max >>>>> >>>>> >>>>> --0000000000004e2bf70575bfff44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The ideal long-term solution is, as Romain mentions, pushi= ng the runner-specific code up to be maintained by each runner with a stabl= e API to use to talk to Beam. Unfortunately, I think we're still a long= way from having this Stable API, or having the clout for non-beam-develope= rs to maintain these bindings externally (though hopefully we'll get th= ere).

In the short term, we're stuck with either= hurting users that want to stick with Flink 1.5, hurting users that want t= o upgrade to Flink 1.6, or supporting both. Is Beam's interaction with = Flink such that we can't simply have separate targets linking the same = Beam code against one or the other? (I.e. are code changes needed?) If so, = we'll probably need a flink-runner-1.5 module, a flink-runner-1.6, and = a flink-runner-common module.=C2=A0Or we hope that all users are happy with= 1.5 until a certain point in time when they all want to simultaneously jum= p to 1.6 and Beam at the same time. Maybe that's enough in the short te= rm, but longer term we need a more sustainable solution.=C2=A0

On Thu, Sep 1= 3, 2018 at 7:13 AM Romain Manni-Bucau <rmannibucau@gmail.com> wrote:
Hi guys,

Isnt the issue "only" that beam has this code instead of e= ngines?

Assuming beam ru= nner facing api is stable - which must be the case anyway - and that each e= ngine has its integration (flink-beam instead of beam-runners-flink), then = this issue disappears by construction.

It also has the advantage to have a better maintenance.

Side note: this is what happ= ent which arquillian, originally the community did all adapters impl then e= ach vendor took it back in house to make it better.
=
Any way to work in that direction maybe?
<= /div>
Le jeu. 13 sept. 2018 = 00:49, Thomas Weise <thw@apache.org> a =C3=A9crit=C2=A0:
The main problem here is that users are forced to= upgrade infrastructure to obtain new features in Beam, even when those fea= tures actually don't require such changes. As an example, another updat= e to Flink 1.6.0 was proposed (without supporting new functionality in Beam= ) and we already know that it breaks compatibility (again).

<= div>I think that upgrading to a Flink X.Y.0 version isn't a good idea t= o start with. But besides that, if we want to grow adoption, then we need t= o focus on stability and delivering improvements to Beam without disrupting= users.

In the specific case, ideally the surface = of Flink would be backward compatible, allowing us to stick to a minimum ve= rsion and be able to submit pipelines to Flink endpoints of higher versions= . Some work in that direction is underway (like versioning the REST API). F= YI, lowest common version is what most projects that depend on Hadoop 2.x f= ollow.

Since Beam with Flink 1.5.x client won'= t talk to Flink 1.6 and there are code changes required to make it compile,= we would need to come up with a more involved strategy to support multiple= Flink versions. Till then, I would prefer we favor existing users over sho= rt lived experiments, which would mean stick with 1.5.x and not support 1.6= .0.

Thanks,
Thomas

<= /div>
On Wed, Sep 12, 2018 a= t 1:15 PM Lukasz Cwik <lcwik@google.com> wrote:
As others have already suggested, I a= lso believe LTS releases is the best we can do as a community right now unt= il portability allows us to decouple what a user writes with and how it run= s (the SDK and the SDK environment) from the runner (job service=C2=A0+ sha= red common runner libs=C2=A0+ Flink/Spark/Dataflow/Apex/Samza/...).
Dataflow would be highly invested in having the appropriate too= ling within Apache Beam to support multiple SDK versions against a runner. = This in turn would allow people to use any SDK with any runner and as Rober= t had mentioned, certain optimizations and features would be disabled depen= ding on the capabilities of the runner and the capabilities of the SDK.
=



On Wed, Sep 12, 2018 at 6:38 AM Robert= Bradshaw <robertwb@google.com> wrote:
The target audience is people who want to use the latest Beam= but do not want to use the latest=C2=A0version of the runner, right?=C2=A0=

I think this will be somewhat (though not entirel= y) addressed by Beam LTS releases, where those not wanting to upgrade the r= unner at least have a well-supported version of Beam. In the long term, we = have the division

=C2=A0 =C2=A0 Runner <-> B= eamRunnerSpecificCode <-> CommonBeamRunnerLibs <-> SDK.

(which applies to the job submission as well as execution= ).

Insomuch as the BeamRunnerSpecificCode uses the= public APIs of the runner, hopefully upgrading the runner for minor versio= ns should be a no-op, and we can target the lowest version of the runner th= at makes sense, allowing the user to link against higher versions at his or= her discretion. We should provide built targets that allow this. For major= versions, it may make sense to have two distinct=C2=A0BeamRunnerSpecificCo= de libraries (which may or may not share some common code). I hope these wr= appers are not too thick.=C2=A0

There is a tight c= oupling at the=C2=A0BeamRunnerSpecificCode <-> CommonBeamRunnerLibs l= ayer, but hopefully the bulk of the code lives on the right hand side and c= an be updated as needed independent of the runner. There may be code of the= form "if the runner supports X, do this fast path, otherwise, do this= slow path (or reject the pipeline).=C2=A0

I hope = the=C2=A0CommonBeamRunnerLibs <-> SDK coupling is fairly loose, to th= e point that one could use SDKs from different versions of Beam (or even de= veloped outside of Beam) with an older/newer runner. We may need to add ver= sioning to the Fn/Runner/Job API itself to support this. Right now of cours= e we're still in a pre-1.0, rapid-development phase wrt this API.=C2=A0=



=
On Wed, Sep 12, 2018 at 2:1= 0 PM Etienne Chauchot <echauchot@apache.org> wrote:
Hi Max,

I t= otally agree with your points especially the users priorities (stick to the= already working version) , and the need to leverage important new features= . It is indeed a difficult balance to find .

I ca= n talk for a part I know: for the Spark runner, the aim was to support Data= set native spark API (in place of RDD). For that we needed to upgrade to sp= ark 2.x (and we will probably leverage Beam Row as well).
But suc= h an upgrade is a good amount of work which makes it difficult to commit on= a schedule such as "if there is a major new feature on an execution e= ngine that we want to leverage, then the upgrade in Beam will be done withi= n x months".

Regarding your point on portabil= ity : decoupling SDK from runner with runner harness and SDK harness might = make pipeline authors work easy regarding pipeline maintenance. But, still,= if we upgrade runner libs, then the users might have their runner harness = not work with their engine version.
If such SDK/runner decouplin= g is 100% functional, then we could imaging having multiple runner harnesse= s shipping different versions of the runner libs to solve this problem.
But we would need to support more than one version of the runner lib= s. We chose not to do this on spark runner.

WDYT ?=

Best
Etienne

<= br>
Le mardi 11 septembre 2018 =C3=A0 15:42 +0200, Maximilian Mic= hels a =C3=A9crit=C2=A0:
Hi Beamers,

In the light of the discussion about Beam LTS releas=
es, I'd like to kick 
off a thread about how often we upgrade=
 the execution engine of each 
Runner. By upgrade, I mean major/m=
inor versions which typically break 
the binary compatibility of =
Beam pipelines.

For the Flink Runner, we try to tr=
ack the latest stable version. Some 
users reported that this can=
 be problematic, as it requires them to 
potentially upgrade thei=
r Flink cluster with a new version of Beam.

 From =
a developer's perspective, it makes sense to migrate as early as 
=
possible to the newest version of the execution engine, e.g. to levera=
ge 
the newest features. From a user's perspective, you don&#=
39;t care about the 
latest features if your use case still works=
 with Beam.

We have to please both parties. So I&#=
39;d suggest to upgrade the execution 
engine whenever necessary =
(e.g. critical new features, end of life of 
current version). On=
 the other hand, the upcoming Beam LTS releases will 
contain a l=
onger-supported version.

Maybe we don't need t=
o discuss much about this but I wanted to hear what 
the communit=
y has to say about it. Particularly, I'd be interested in 
ho=
w the other Runner authors intend to do it.

As far=
 as I understand, with the portability being stable, we could 
th=
eoretically upgrade the SDK without upgrading the runtime components. 
That would allow us to defer the upgrade for a longer time.

Best,
Max

=
--0000000000004e2bf70575bfff44--