flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6284) Incorrect sorting of completed checkpoints in ZooKeeperCompletedCheckpointStore
Date Fri, 12 May 2017 12:00:07 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008015#comment-16008015
] 

ASF GitHub Bot commented on FLINK-6284:
---------------------------------------

Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116211254
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java
---
    @@ -346,17 +346,20 @@ public int exists(String pathInZooKeeper) throws Exception {
     			} else {
     				// Initial cVersion (number of changes to the children of this node)
     				int initialCVersion = stat.getCversion();
    -
    -				List<String> children = ZKPaths.getSortedChildren(
    -						client.getZookeeperClient().getZooKeeper(),
    -						ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    -				for (String path : children) {
    -					path = "/" + path;
    +				List<String> childrenInStr =
    +					client.getZookeeperClient().getZooKeeper().
    +						getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    +				List<Long> children = new ArrayList<Long>(childrenInStr.size());
    +				for(String childNode : childrenInStr) {
    +					children.add(new Long(childNode));
    --- End diff --
    
    Ok. I see. I am not sure on this MesosWorker. Using cxid am not sure if we have an API.
If so we can direclty use it. Will be back.


> Incorrect sorting of completed checkpoints in ZooKeeperCompletedCheckpointStore
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-6284
>                 URL: https://issues.apache.org/jira/browse/FLINK-6284
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> Now all completed checkpoints are sorted in their paths when they are recovered in {{ZooKeeperCompletedCheckpointStore}}
. In the cases where the latest checkpoint's id is not the largest in lexical order (e.g.,
"100" is smaller than "99" in lexical order), Flink will not recover from the latest completed
checkpoint.
> The problem can be easily observed by setting the checkpoint ids in {{ZooKeeperCompletedCheckpointStoreITCase#testRecover()}}
to be 99, 100 and 101. 
> To fix the problem, we should explicitly sort found checkpoints in their checkpoint ids,
without the usage of {{ZooKeeperStateHandleStore#getAllSortedByName()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message