beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Knowles <k...@apache.org>
Subject Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness
Date Wed, 06 Nov 2019 19:03:21 GMT
I think the portability framework is designed for this. The runner controls
the coder on the grpc ports and the runner controls the process bundle
descriptor.

I commented on the doc. I think what is missing is analysis of scope of SDK
harness changes and risk to model consistency

    Approach 2: probably no SDK harness work / compatible with existing
Beam model so no risk of introducing inconsistency

    Approach 1: what are all the details?
        option a: if the SDK harness has to understand "values without
windows" then very large changes and high risk of introducing inconsistency
(I eliminated many of these inconsistencies)
        option b: if the coder just puts default window/timestamp/pane info
on elements, then it is the same as approach 2, no work / no risk

Kenn

On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <sunjincheng121@gmail.com>
wrote:

> Hi all,
>
> I am trying to make some improvements of portability framework to make it
> usable in other projects. However, we find that the coder between runner
> and harness can only be FullWindowedValueCoder. This means each time when
> sending a WindowedValue, we have to encode/decode timestamp, windows and
> pan infos. In some circumstances(such as using the portability framework in
> Flink), only values are needed between runner and harness. So, it would be
> nice if we can configure the coder and avoid redundant encoding and
> decoding between runner and harness to improve the performance.
>
> There are two approaches to solve this issue:
>
>     Approach 1:  Support ValueOnlyWindowedValueCoder between runner and
> harness.
>     Approach 2:  Add a "constant" window coder that embeds all the
> windowing information as part of the coder that should be used to wrap the
> value during decoding.
>
> More details can be found here [1].
>
> As of the shortcomings of “Approach 2” which still need to encode/decode
> timestamp and pane infos, we tend to choose “Approach 1” which brings
> better performance and is more thorough.
>
> Welcome any feedback :)
>
> Best,
> Jincheng
>
> [1]
> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing
>
>

Mime
View raw message