flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8360) Implement task-local state recovery
Date Fri, 05 Jan 2018 14:03:01 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313149#comment-16313149

ASF GitHub Bot commented on FLINK-8360:

Github user pnowojski commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java ---
    @@ -669,31 +676,14 @@ else if (current == ExecutionState.CANCELING) {
     			Environment env = new RuntimeEnvironment(
     				jobId, vertexId, executionId, executionConfig, taskInfo,
     				jobConfiguration, taskConfiguration, userCodeClassLoader,
    -				memoryManager, ioManager, broadcastVariableManager,
    +				memoryManager, ioManager, broadcastVariableManager, taskStateManager,
     				accumulatorRegistry, kvStateRegistry, inputSplitProvider,
     				distributedCacheEntries, writers, inputGates,
     				checkpointResponder, taskManagerConfig, metrics, this);
     			// let the task code create its readers and writers
    -			// the very last thing before the actual execution starts running is to inject
    -			// the state into the task. the state is non-empty if this is an execution
    -			// of a task that failed but had backuped state from a checkpoint
    -			if (null != taskRestore && taskRestore.getTaskStateSnapshot() != null) {
    -				if (invokable instanceof StatefulTask) {
    -					StatefulTask op = (StatefulTask) invokable;
    -					op.setInitialState(taskRestore.getTaskStateSnapshot());
    --- End diff --
    grrr, this cryptic `op` name forced me to look into the source code to check whether this
is is instance of `StatefullTask` or not :/ could you rename it to something that at least
is not an abbreviation?

> Implement task-local state recovery
> -----------------------------------
>                 Key: FLINK-8360
>                 URL: https://issues.apache.org/jira/browse/FLINK-8360
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>             Fix For: 1.5.0
> This issue tracks the development of recovery from task-local state. The main idea is
to have a secondary, local copy of the checkpointed state, while there is still a primary
copy in DFS that we report to the checkpoint coordinator.
> Recovery can attempt to restore from the secondary local copy, if available, to save
network bandwidth. This requires that the assignment from tasks to slots is as sticky is possible.
> For starters, we will implement this feature for all managed keyed states and can easily
enhance it to all other state types (e.g. operator state) later.

This message was sent by Atlassian JIRA

View raw message