beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romain Manni-Bucau <rmannibu...@gmail.com>
Subject Re: Graal instead of docker?
Date Wed, 09 May 2018 08:08:25 GMT
Le mer. 9 mai 2018 00:57, Henning Rohde <herohde@google.com> a écrit :

> There are indeed lots of possibilities for interesting docker alternatives
> with different tradeoffs and capabilities, but in generally both the runner
> as well as the SDK must support them for it to work. As mentioned, docker
> (as used in the container contract) is meant as a flexible main option but
> not necessarily the only option. I see no problem with certain
> pipeline-SDK-runner combinations additionally supporting a specialized
> setup. Pipeline can be a factor, because that some transforms might depend
> on aspects of the runtime environment -- such as system libraries or
> shelling out to a /bin/foo.
>
> The worker boot code is tied to the current container contract, so
> pre-launched workers would presumably not use that code path and are not be
> bound by its assumptions. In particular, such a setup might want to invert
> who initiates the connection from the SDK worker to the runner. Pipeline
> options and global state in the SDK and user functions process might make
> it difficult to safely reuse worker processes across pipelines, but also
> doable in certain scenarios.
>

This is not that hard actually and most java env do it.

Main concern is 1. Being tight to an impl detail and 2. A bad architecture
which doeent embrace the community



> Henning
>
> On Tue, May 8, 2018 at 3:51 PM Thomas Weise <thw@apache.org> wrote:
>
>>
>>
>> On Sat, May 5, 2018 at 3:58 PM, Robert Bradshaw <robertwb@google.com>
>> wrote:
>>
>>>
>>> I would welcome changes to
>>>
>>> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
>>> that would provide alternatives to docker (one of which comes to mind is
>>> "I
>>> already brought up a worker(s) for you (which could be the same process
>>> that handled pipeline construction in testing scenarios), here's how to
>>> connect to it/them.") Another option, which would seem to appeal to you
>>> in
>>> particular, would be "the worker code is linked into the runner's binary,
>>> use this process as the worker" (though note even for java-on-java, it
>>> can
>>> be advantageous to shield the worker and runner code from each others
>>> environments, dependencies, and version requirements.) This latter should
>>> still likely use the FnApi to talk to itself (either over GRPC on local
>>> ports, or possibly better via direct function calls eliminating the RPC
>>> overhead altogether--this is how the fast local runner in Python works).
>>> There may be runner environments well controlled enough that "start up
>>> the
>>> workers" could be specified as "run this command line." We should make
>>> this
>>> environment message extensible to other alternatives than "docker
>>> container
>>> url," though of course we don't want the set of options to grow too large
>>> or we loose the promise of portability unless every runner supports every
>>> protocol.
>>>
>>>
>> The pre-launched worker would be an interesting option, which might work
>> well for a sidecar deployment.
>>
>> The current worker boot code though makes the assumption that the runner
>> endpoint to phone home to is known when the process is launched. That
>> doesn't work so well with a runner that establishes its endpoint
>> dynamically. Also, the assumption is baked in that a worker will only serve
>> a single pipeline (provisioning API etc.).
>>
>> Thanks,
>> Thomas
>>
>>
>

Mime
View raw message