beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Knowles <>
Subject Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness
Date Wed, 06 Nov 2019 19:03:21 GMT
I think the portability framework is designed for this. The runner controls
the coder on the grpc ports and the runner controls the process bundle

I commented on the doc. I think what is missing is analysis of scope of SDK
harness changes and risk to model consistency

    Approach 2: probably no SDK harness work / compatible with existing
Beam model so no risk of introducing inconsistency

    Approach 1: what are all the details?
        option a: if the SDK harness has to understand "values without
windows" then very large changes and high risk of introducing inconsistency
(I eliminated many of these inconsistencies)
        option b: if the coder just puts default window/timestamp/pane info
on elements, then it is the same as approach 2, no work / no risk


On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <>

> Hi all,
> I am trying to make some improvements of portability framework to make it
> usable in other projects. However, we find that the coder between runner
> and harness can only be FullWindowedValueCoder. This means each time when
> sending a WindowedValue, we have to encode/decode timestamp, windows and
> pan infos. In some circumstances(such as using the portability framework in
> Flink), only values are needed between runner and harness. So, it would be
> nice if we can configure the coder and avoid redundant encoding and
> decoding between runner and harness to improve the performance.
> There are two approaches to solve this issue:
>     Approach 1:  Support ValueOnlyWindowedValueCoder between runner and
> harness.
>     Approach 2:  Add a "constant" window coder that embeds all the
> windowing information as part of the coder that should be used to wrap the
> value during decoding.
> More details can be found here [1].
> As of the shortcomings of “Approach 2” which still need to encode/decode
> timestamp and pane infos, we tend to choose “Approach 1” which brings
> better performance and is more thorough.
> Welcome any feedback :)
> Best,
> Jincheng
> [1]

View raw message