flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From StephanEwen <...@git.apache.org>
Subject [GitHub] flink issue #2376: [FLINK-3755] Introduce key groups for key-value state to ...
Date Fri, 26 Aug 2016 16:47:30 GMT
Github user StephanEwen commented on the issue:

    Very good work and very nice code!
    Some comments after a joint review:
      - The most critical issue is that there should not be any blocking on async threads
during task shutdown. This unnecessarily delays responses to canceling and redeployment.
      - At this point, the `KeyGroupAssigner` interface seems a bit useless, especially if
it is not parametrized with variable key group mappings. For the sake of making this simpler
and more efficient, one could just have a static method for that.
      - I would suggest to make the assumption that key groups are always used (they should
be, even if their number is equal to the parallelism), and drop the checks for `numberOfKeyGroups
> 0`, for example in the KeyGroupHashPartitioner.
      - A bit more difficult is what to assume as the default number of key groups. We thought
about assuming a default of `128`. That has no overhead in state backends like RocksDB and
also allows initial job deployments which did not think about properly configuring this to
have some freedom to scale out. If the parallelism is >= 128, this should probably round
to the next highest power-of-two.
      - There are some log statements which cause log flooding, like an INFO log statement
for every checkpoint stream (factory) created.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message