flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: refactor StreamConfig
Date Tue, 04 Jul 2017 14:43:52 GMT
I think the proposed changed are good, I just wanted to make sure that they don’t interfere
with what other people are doing.

I also proposed these steps on the Github PR:
Also, for actually doing the changes I suggest separate steps, i.e. separate commits. With
possibly separate PRs to make reviewing easier and to make the changes more isolated:

 - Rename StreamConfig to StreamTaskConfig and make it serialisable, instead of relying on
an underlying Configuration. This means that the StreamTaskConfig itself has fields for storing
 - Introduce OperatorConfig and move only those fields that the operator should see from StreamTaskConfig
to OperatorConfig. Initialize the operator with an OperatorConfig.

Regarding what to put in the OperatorConfig and what in the StreamTaskConfig: why are these
still in the OperatorConfig?
       2)  streamOperator
       3)  input serializer.
       4)  output edges and serializers.
       5)  chain.index

I think only the StreamTask, that is responsible for building the OperatorChain needs to have
that information.


> On 4. Jul 2017, at 15:56, xu <xupingyong008@163.com> wrote:
> HI All:
>      I am sorry about working with StreamConfig(https://github.com/apache/flink/pull/4241)
which may conflicts with others' work before discussing.
>      Motivation:
>          A Task contains one or more operators with chainning, however configs of operator
and task are all put in StreamConfig. For example, when an opeator is setup with the StreamConfig,
it can see the interface about physicalEdges or chained.task.configs, which are confused.
 Similarly a streamTask should not see the interface about chain.index.
>          So we need to separate OperatorConfig from StreamConfig. A streamTask inits
with the streamConfig, and then extracts operatorConfigs from it, build streamOperators with
every operatorConfig. 
>     OperatorConfig:  for the streamOperator to setup with, it constains informations
that only belong to the streamOperator. It contains:
>        1)  operator information: name, id
>        2)  streamOperator
>        3)  input serializer.
>        4)  output edges and serializers.
>        5)  chain.index
>        6)  state.key.serializer
>      StreamConfig: for the streamTask to use:
>        1) in.physical.edges
>        2) out.physical.edges
>        3) chained OperatorConfigs
>        4) execution environment: checkpoint, state.backend and so on... 
>     Proposed Change
>       I propose overall changes:
>        1) Builde jobGraph from streamGraph
>        2) StreamOperator is setup with a operatorConfig, so the setup interface need
to change
>     (1) Build jobGraph from streamGraph
>        When building, first we get every operatorConfig from the streamNode. And then
put operatorConfigs of streamNodes to a streamConfig when we chain them to a jobVertex.
>     (2) StreamOperator setup with OperatorProperties
>        An OperatorConfig is provided instead of streamConfig when the streamOperator
sets up. Thanks to the advice of StephanEwan, OperatorConfig is no need to have a Map of "configKey"
to values, just is a  serializable class with the respective fields, And StreamConfig still
relys on an underlying Configuration, because the streamConfig flows by its underlying configuration.
>       There are people who have already thought about this, maybe someone has been working
on it. I need your advice.
>       Thanks a lot for replying and Best Regards.
>       JiPing

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message