flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xu <xupingyong...@163.com>
Subject refactor StreamConfig
Date Tue, 04 Jul 2017 13:56:23 GMT
HI All:
     I am sorry about working with StreamConfig(https://github.com/apache/flink/pull/4241)
which may conflicts with others' work before discussing.

         A Task contains one or more operators with chainning, however configs of operator
and task are all put in StreamConfig. For example, when an opeator is setup with the StreamConfig,
it can see the interface about physicalEdges or chained.task.configs, which are confused.
 Similarly a streamTask should not see the interface about chain.index.
         So we need to separate OperatorConfig from StreamConfig. A streamTask inits with
the streamConfig, and then extracts operatorConfigs from it, build streamOperators with every
    OperatorConfig:  for the streamOperator to setup with, it constains informations that
only belong to the streamOperator. It contains:
       1)  operator information: name, id
       2)  streamOperator
       3)  input serializer.
       4)  output edges and serializers.
       5)  chain.index
       6)  state.key.serializer

     StreamConfig: for the streamTask to use:
       1) in.physical.edges
       2) out.physical.edges
       3) chained OperatorConfigs
       4) execution environment: checkpoint, state.backend and so on... 
    Proposed Change
      I propose overall changes:
       1) Builde jobGraph from streamGraph
       2) StreamOperator is setup with a operatorConfig, so the setup interface need to change

    (1) Build jobGraph from streamGraph
       When building, first we get every operatorConfig from the streamNode. And then put
operatorConfigs of streamNodes to a streamConfig when we chain them to a jobVertex.

    (2) StreamOperator setup with OperatorProperties
       An OperatorConfig is provided instead of streamConfig when the streamOperator sets
up. Thanks to the advice of StephanEwan, OperatorConfig is no need to have a Map of "configKey"
to values, just is a  serializable class with the respective fields, And StreamConfig still
relys on an underlying Configuration, because the streamConfig flows by its underlying configuration.
      There are people who have already thought about this, maybe someone has been working
on it. I need your advice.

      Thanks a lot for replying and Best Regards.

  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message