From dev-return-9551-archive-asf-public=cust-asf.ponee.io@beam.apache.org Sat May 5 08:43:35 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 52B37180671 for ; Sat, 5 May 2018 08:43:34 +0200 (CEST) Received: (qmail 1074 invoked by uid 500); 5 May 2018 06:43:32 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 1060 invoked by uid 99); 5 May 2018 06:43:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 May 2018 06:43:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 693251806E2 for ; Sat, 5 May 2018 06:43:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ULMa8_jlCqsY for ; Sat, 5 May 2018 06:43:28 +0000 (UTC) Received: from mail-it0-f46.google.com (mail-it0-f46.google.com [209.85.214.46]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 205FC5F24C for ; Sat, 5 May 2018 06:43:27 +0000 (UTC) Received: by mail-it0-f46.google.com with SMTP id q4-v6so5986652ite.3 for ; Fri, 04 May 2018 23:43:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=y6Q6lU6ZIt43dj4HjzStBBMqpfEGeKGAgEPCg1oz+SQ=; b=jviBMUCy0orXJncxHyGdnZOva5XRk0jtnFkZAM9H4SplDE97eOLTqHqD6Aut+5q9V3 wBfKG2KpXKCoEYVm1mlwvZG1eOkHmwmzK8PIdwWUWjN6rjLwiWWBijy70JPeF6I0GaL4 iO2TwNsGCkRSZFldUTvxvk2CUhuqV9IqbuhJHJ2d6H5qx42JTH16HIe+Yn65meZtf1xk lKQkn6Bze/ufq884jKwz+9qu6168aFwu4siTqovF1SEN3KEbfcDkUud+fBlqYr1c1va5 Qn08I7LVeMrwEb6ydLU3VowdKhgh4EGi/Ix7zOSDuX2wV0GCs8ff4Mm7XktZgtsc86u6 OyqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=y6Q6lU6ZIt43dj4HjzStBBMqpfEGeKGAgEPCg1oz+SQ=; b=ZF3YLMieCUiRDtq6sfhfHk+t0Y3P5s1rn3NtLN4ir/SjgvQmhHp2zKFtlJj0uJVYq0 qXbXVb4n+MlYzVEyceqxdeJBBkUmki1tFV2uhesVAp/2pFRkFxkVZDYhvBIBSs/j8haw U3GIDw4wKqZOx0drdOZQmfnGj+PHjfXOcNyGLJxaQ6ecBS2dffqt9ER504uUpg2tl5mh ohM0uWM2ax6KSc6MqgTkNA4AWE757pjsP05QAQeBBGG3uHBOxgishSy1rtypFYCDfUS0 1BS5weo7uXykysMYI52JqxRah4CgaX19SXNYlYyDmeQ4Hhs8q5yI1R6/dxqV64Sfh5Uu xy3g== X-Gm-Message-State: ALQs6tAYWoP7Imh69XJi95wnxOe+qiMAVZIegAurIYOnZ2TAwSX3Fqj1 Owe/ciKiNRgwneAsXDeubQM5oG5MTXF0BWp0VfW6tyHH X-Google-Smtp-Source: AB8JxZpdt1Yjrx7OGK1drsvHYuegLt70DcofGSPI2Cngg+VoKFZpGn2BXpBPmRBtNf7QKz0jy1+A9S6s8rYehaakbNU= X-Received: by 2002:a24:f04e:: with SMTP id p14-v6mr22711484iti.129.1525502605300; Fri, 04 May 2018 23:43:25 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Reuven Lax Date: Sat, 05 May 2018 06:43:13 +0000 Message-ID: Subject: Re: Graal instead of docker? To: dev@beam.apache.org Content-Type: multipart/alternative; boundary="000000000000e52f61056b6fc07c" --000000000000e52f61056b6fc07c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I don't believe we enforce docker anywhere. In fact if someone wanted to run an all-windows beam cluster, they would probably not use docker for their runner (docker runs on Windows, but not efficiently). On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau wrote: > > > 2018-05-05 2:33 GMT+02:00 Andrew Pilloud : > >> What docker really buys is a package format and runtime environment that >> is language and operating system agnostic. The docker packaging and >> runtime format is the de facto standard for portable applications such a= s >> this, and there is a group trying to turn it into an actual standard. >> >> I would agree with you that dockerd has become bloated but there are >> projects that solve that. There is no longer lock-in to dockerd, there >> are package format compatible docker replacements that eliminate the >> performance issues and overhead associated with docker. CRI-O ( >> https://github.com/kubernetes-incubator/cri-o) is a really cool RedHat >> project which is a minimalist replacement for docker. I was recently >> working at a startup where I migrated our "data mover" appliance from >> Docker to CRI-O. Our application was able to get direct access to the >> ethernet driver and block devices which enabled a huge performance boost >> but we were also able to run containers produced by docker without >> modification. >> >> You mention that docker is "detail of one runner+vendor corrupting all >> the project and adding complexity and work to everyone". It sounds like >> you have a specific example you'd like to share? Is there a runner that = is >> unable to move to portability because of docker? >> > > IBM one for instance, some custom ones like an hazelcast based one, etc..= . > More generally any runner developped outside beam itself - even if we tak= e > a snapshot today, most of beam's ones have the same pitall. > > Note: i never said docker was a bad techno or so. Let me try to clarify. > > Main issue is that you enforce docker usage which is still trendy. It is > like scla which was promishing to kill java, check what it does today... > It starts to be tooled but it is also very impacting on the deployment > side and for a good number of beam users who deploy it outside the cloud = it > is an issue. > Keep in mind beam is embeddable by design, it is not a runner environment > and with the docker choice it imposes some environment which is > inconsistent with beam design itself and this is where this choice blocks= . > > >> >> Andrew >> >> On Fri, May 4, 2018 at 4:32 PM Henning Rohde wrote: >> >>> Romain, >>> >>> Docker, unlike selinux, solves a great number of tangible problems for >>> us with IMO a relatively small tax. It does not have to be the only way= . >>> Some of the concerns you bring up along with possibilities were also >>> discussed here: https://s.apache.org/beam-fn-api-container-contract. I >>> encourage you to take a look. >>> >>> Thanks, >>> Henning >>> >>> >>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau >>> wrote: >>> >>>> >>>> >>>> Le 4 mai 2018 21:31, "Henning Rohde" a =C3=A9crit= : >>>> >>>> I disagree with the characterization of docker and the implications >>>> made towards portability. Graal looks like a neat project (and I never >>>> thought I would live to see the phrase "Practical Partial Evaluation" = ..), >>>> but it doesn't address the needs of portability. In addition to Luke's >>>> examples, Go and most other languages don't work on it either. Docker >>>> containers also address packaging, OS dependencies, conflicting versio= ns >>>> and distribution aspects in addition to truly universal language suppo= rt. >>>> >>>> >>>> This is wrong, docker also has its conflicts, is not universal (fails >>>> on windows and mac easily - as host or not, cloud vendors put layers >>>> limiting or corrupting it, and it is an infra constraint imposed and a >>>> vendor locking not welcomed in beam IMHO). >>>> >>>> This is my main concern. All the work done looks like an implemzntatio= n >>>> detail of one runner+vendor corrupting all the project and adding >>>> complexity and work to everyone instead of keeping it localised >>>> (technically it is possible). >>>> >>>> Would you accept i enforce you to use selinux? Using docker is the sam= e >>>> kind of constraint. >>>> >>>> >>>> That said, it's entirely fine for some runners to use Jython, Graal, >>>> etc to provide a specialized offering similar to the direct runners, b= ut it >>>> would be disjoint from portability IMO. >>>> >>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau < >>>> rmannibucau@gmail.com> wrote: >>>> >>>>> >>>>> >>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" a =C3=A9crit : >>>>> >>>>> I did take a look at Graal a while back when thinking about how >>>>> execution environments could be defined, my concerns were related to = it not >>>>> supporting all of the features of a language. >>>>> For example, its typical for Python to load and call native libraries >>>>> and Graal can only execute C/C++ code that has been compiled to LLVM. >>>>> Also, a good amount of people interested in using ML libraries will >>>>> want access to GPUs to improve performance which I believe that Graal= can't >>>>> support. >>>>> >>>>> It can be a very useful way to run simple lamda functions written in >>>>> some language directly without needing to use a docker environment bu= t you >>>>> could probably use something even lighter weight then Graal that is >>>>> language specific like Jython. >>>>> >>>>> >>>>> >>>>> Right, the jsr223 impl works very well but you can also have a perf >>>>> boost using native (like v8 java binding for js for instance). It is = way >>>>> more efficient than docker most of the time and not code intrusive at= all >>>>> in runners so likely more adoption-able and maintainable. That said a= ll is >>>>> doable behind the jsr223 so maybe not a big deal in terms of api. We = just >>>>> need to ensure portability work stay clean and actually portable and = doesnt >>>>> impact runners as poc done until today did. >>>>> >>>>> Works for me. >>>>> >>>>> >>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau < >>>>> rmannibucau@gmail.com> wrote: >>>>> >>>>>> Hi guys >>>>>> >>>>>> Since some time there are efforts to have a language portable suppor= t >>>>>> in beam but I cant really find a case it "works" being based on dock= er >>>>>> except for some vendor specific infra. >>>>>> >>>>>> Current solution: >>>>>> >>>>>> 1. Is runner intrusive (which is bad for beam and prevents adoption >>>>>> of big data vendors) >>>>>> 2. Based on docker (which assumed a runtime environment and is very >>>>>> ops/infra intrusive and likely too $$ quite often for what it brings= ) >>>>>> >>>>>> Did anyone had a look to graal which seems a way to make the feature >>>>>> doable in a lighter manner and optimized compared to default jsr223 = impls? >>>>>> >>>>>> >>>>> >>>> > --000000000000e52f61056b6fc07c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I don't believe we enforce docker anywhere. In fact i= f someone wanted to run an all-windows beam cluster, they would probably no= t use docker for their runner (docker runs on Windows, but not efficiently)= .=C2=A0

On Fri, May 4,= 2018, 11:19 PM Romain Manni-Bucau <rmannibucau@gmail.com> wrote:

2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apilloud@google.com>:
What docker really buys = is a package format and runtime environment that is language and operating = system agnostic.=C2=A0The docker packaging and ru= ntime format is the de facto standard for portable applications such as thi= s, and there is a group trying to turn it into an actual standard.
I= would agree with you that dockerd has become bloated but there are project= s that solve that.=C2=A0There is no longer lock-in to do= ckerd, there are package format compatible docker replacements that elimina= te the performance issues and overhead associated with docker.= =C2=A0CRI-O (https://github.com/kubernetes-incub= ator/cri-o) is a really cool RedHat project which is a minimalist repla= cement for docker. I was recently working at a startup where I migrated our= "data mover" appliance from Docker to CRI-O. Our application was= able to get direct access to the ethernet driver and block devices which e= nabled a huge performance boost but we were also able to run containers pro= duced by docker without modification.

You mention t= hat docker is "detail of one runner+vendor corrupti= ng all the project=C2=A0and adding complexi= ty and work to everyone". It sounds like you have a specific ex= ample you'd like to share? Is there a runner that is unable to move to = portability because of docker?
<= div>
IBM one for instance, some custom ones like an hazelcast= based one, etc... More generally any runner developped outside beam itself= - even if we take a snapshot today, most of beam's ones have the same = pitall.

Note: i never said docker was a bad techno= or so. Let me try to clarify.

Main issue is that = you enforce docker usage which is still trendy. It is like scla which was p= romishing to kill java, check what it does today...
It starts to = be tooled but it is also very impacting on the deployment side and for a go= od number of beam users who deploy it outside the cloud it is an issue.
Keep in mind beam is embeddable by design, it is not a runner enviro= nment and with the docker choice it imposes some environment which is incon= sistent with beam design itself and this is where this choice blocks.
=
=C2=A0

Andrew
<= /div>

On Fri,= May 4, 2018 at 4:32 PM Henning Rohde <herohde@google.com> wrote:=
Romain,

Docker, unlike selinux, solves a great number of tangi= ble problems for us with IMO a relatively small tax. It does not have to be= the only way. Some of the concerns you bring up along with possibilities w= ere also discussed here:=C2=A0https://s.apache.or= g/beam-fn-api-container-contract. I encourage you to take a look.
=

Thanks,
=C2=A0Henning


On Fri, May 4, 2018 at 3= :18 PM Romain Manni-Bucau <rmannibucau@gmail.com> wrote:


Le=C2=A04 mai 2018 21:31, &quo= t;Henning Rohde" <herohde@google.com> a =C3=A9crit=C2=A0:
I disagree with the characterization = of docker and the implications made towards portability.=C2=A0Graal = looks like a neat project (and I never thought I would live to see the phra= se "Practical Partial Evaluation" ..), but it doesn't address= the needs of portability. In addition to Luke's examples, Go and most = other languages don't work on it either. Docker containers also address= packaging, OS dependencies, conflicting versions and distribution aspects = in addition to truly universal language support.

This is wrong, dock= er also has its conflicts, is not universal (fails on windows and mac easil= y - as host or not, cloud vendors put layers limiting or corrupting it, and= it is an infra constraint imposed and a vendor locking not welcomed in bea= m IMHO).

This is my main= concern. All the work done looks like an implemzntation detail of one runn= er+vendor corrupting all the project and adding complexity and work to ever= yone instead of keeping it localised (technically it is possible).

Would you accept i enforce you t= o use selinux? Using docker is the same kind of constraint.


That said, it's entirely fine for som= e runners to use Jython, Graal, etc to provide a specialized offering simil= ar to the direct runners, but it would be disjoint from portability IMO.

On Fri, May 4, 2018 at 10:14 AM Roma= in Manni-Bucau <rmannibucau@gmail.com> wrote:


Le=C2=A04 mai 2018 17:55, "Lukasz Cw= ik" <lcwik@google.com> a =C3=A9crit=C2=A0:
I did take a look at Graal a w= hile back when thinking about how execution environments could be defined, = my concerns were related to it not supporting all of the features of a lang= uage.
For example, its typical for Python to load and call native libra= ries and Graal can only execute C/C++ code that has been compiled to LLVM.<= /div>
Also, a good amount of people interested in using ML libraries wi= ll want access to GPUs to improve performance which I believe that Graal ca= n't support.

It can be a very useful way to ru= n simple lamda functions written in some language directly without needing = to use a docker environment but you could probably use something even light= er weight then Graal that is language specific like Jython.

Right, the jsr223 impl works very well but you ca= n also have a perf boost using native (like v8 java binding for js for inst= ance). It is way more efficient than docker most of the time and not code i= ntrusive at all in runners so likely more adoption-able and maintainable. T= hat said all is doable behind the jsr223 so maybe not a big deal in terms o= f api. We just need to ensure portability work stay clean and actually port= able and doesnt impact runners as poc done until today did.

Works for me.
=

On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bu= cau <rmannibucau@gmail.com> wrote:
Hi guys

Since some time there are efforts to have a language portable sup= port in beam but I cant really find a case it "works" being based= on docker except for some vendor specific infra.
Current solution:

1. Is runner intrusive (which is bad for beam and preve= nts adoption of big data vendors)
2. Based on docker= (which assumed a runtime environment and is very ops/infra intrusive and l= ikely too $$ quite often for what it brings)

Did anyone had a look to graal which seems a way to ma= ke the feature doable in a lighter manner and optimized compared to default= jsr223 impls?




--000000000000e52f61056b6fc07c--