beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Valentyn Tymofieiev (JIRA)" <>
Subject [jira] [Commented] (BEAM-2600) Artifact for Python SDK harness that can be referenced in pipeline definition
Date Wed, 12 Jul 2017 06:09:00 GMT


Valentyn Tymofieiev commented on BEAM-2600:

Dataflow Java runner has workerHarnessContainerImage pipeline option, although it is specific
to Dataflow runner. I had a proposal[1] to introduce a runner-independent option, but I came
to realize we may need finer granularity than to specify one SDK harness for pipeline.  We
need to specify SDK harness separately for each component of the pipeline such as DoFn/SDK
function. Beam FnAPI vision suggests using containerized processes for running SDK harness,
so I could see sdk_harness_container_image eventually to be a param in SdkFunctionSpec, but
we'd have to clarify the specification and expectations for SDK harness containers in Beam.
The question where to put the information about SDK harness, and how the runners will use
it should not be specific to a particular SDK language. 


> Artifact for Python SDK harness that can be referenced in pipeline definition
> -----------------------------------------------------------------------------
>                 Key: BEAM-2600
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py
>            Reporter: Kenneth Knowles
>            Assignee: Ahmet Altay
>              Labels: beam-python-everywhere
> In order to build a pipeline that invokes a Python UDF, we need to be able to construct
something like this:
> {code}
> SdkFunctionSpec {
>   environment = <python SDK harness>,
>   spec = {
>     urn = <python SDK pickled DoFn>,
>     data = <pickled DoFn>
>   }
> }
> {code}
> I could be out of date, but based on a couple of conversations I do not know that there
exists anything we can put for "<python SDK harness>" today. For prototyping, it could
be just a symbol that runners have to know. But eventually it should be something that runners
can instantiate without knowing anything about the SDK that put it there. I imagine it may
encompass "custom containers" eventually, though that doesn't block anything immediately.

This message was sent by Atlassian JIRA

View raw message