flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6742) Improve error message when savepoint migration fails due to task removal
Date Sun, 25 Jun 2017 10:42:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062282#comment-16062282
] 

ASF GitHub Bot commented on FLINK-6742:
---------------------------------------

Github user gyfora commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4083#discussion_r123895607
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/savepoint/SavepointV2.java
---
    @@ -168,10 +168,27 @@ public static Savepoint convertToOperatorStateSavepointV2(
     				expandedToLegacyIds = true;
     			}
     
    +			if (jobVertex == null) {
    +				throw new IllegalStateException(
    +					"Could not find task for state with ID " + taskState.getJobVertexID() + ". " +
    +					"When migrating a savepoint from a version < 1.3 please make sure that the topology
was not " +
    +					"changed through removal of a stateful operator or modification of a chain containing
a stateful " +
    +					"operator.");
    +			}
    +
     			List<OperatorID> operatorIDs = jobVertex.getOperatorIDs();
     
     			for (int subtaskIndex = 0; subtaskIndex < jobVertex.getParallelism(); subtaskIndex++)
{
    -				SubtaskState subtaskState = taskState.getState(subtaskIndex);
    +				SubtaskState subtaskState;
    +				try {
    +					subtaskState = taskState.getState(subtaskIndex);
    --- End diff --
    
    Sorry for commenting late on this but I have had some major migration issues in the last
few days :D 
    I think we should explicitly compare parallelism instead of relying on the error:
    if (taskState.getStates().size() != jobVertex.getParallelism()) --> error
    
    Otherwise this will not fail on lower parallelism.


> Improve error message when savepoint migration fails due to task removal
> ------------------------------------------------------------------------
>
>                 Key: FLINK-6742
>                 URL: https://issues.apache.org/jira/browse/FLINK-6742
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.0
>            Reporter: Gyula Fora
>            Assignee: Chesnay Schepler
>            Priority: Minor
>              Labels: flink-rel-1.3.1-blockers
>
> Caused by: java.lang.NullPointerException
> 	at org.apache.flink.runtime.checkpoint.savepoint.SavepointV2.convertToOperatorStateSavepointV2(SavepointV2.java:171)
> 	at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:75)
> 	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1090)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message