flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8360) Implement task-local state recovery
Date Tue, 09 Jan 2018 19:57:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319049#comment-16319049
] 

ASF GitHub Bot commented on FLINK-8360:
---------------------------------------

Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/5239
  
    Thanks for going through the general design @StephanEwen ! As we discussed, I agree with
your first point. For the second point about RocksDB, this PR already contains an optimized
way to deal with incremental local checkpoints that we did not discuss in our review, because
I thought it is too much of a low level detail.
    It does not work with duplicating streams. Instead, I introduced a state handle type for
a local directory. In fact, I mapped the previous incremental recovery from DFS state also
to this new handle type: dfs state is first downloaded and then it also simply becomes a local
directory state handle. From there, both incremental recovery paths are identical.


> Implement task-local state recovery
> -----------------------------------
>
>                 Key: FLINK-8360
>                 URL: https://issues.apache.org/jira/browse/FLINK-8360
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>             Fix For: 1.5.0
>
>
> This issue tracks the development of recovery from task-local state. The main idea is
to have a secondary, local copy of the checkpointed state, while there is still a primary
copy in DFS that we report to the checkpoint coordinator.
> Recovery can attempt to restore from the secondary local copy, if available, to save
network bandwidth. This requires that the assignment from tasks to slots is as sticky is possible.
> For starters, we will implement this feature for all managed keyed states and can easily
enhance it to all other state types (e.g. operator state) later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message