Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 00F1B200C72 for ; Fri, 12 May 2017 13:45:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F39B0160BC8; Fri, 12 May 2017 11:45:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 45034160BA3 for ; Fri, 12 May 2017 13:45:11 +0200 (CEST) Received: (qmail 80193 invoked by uid 500); 12 May 2017 11:45:08 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 80184 invoked by uid 99); 12 May 2017 11:45:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 May 2017 11:45:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 166D0C3481 for ; Fri, 12 May 2017 11:45:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Xa_Jm12YUmMS for ; Fri, 12 May 2017 11:45:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id AEB8860DB8 for ; Fri, 12 May 2017 11:45:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 24091E0665 for ; Fri, 12 May 2017 11:45:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B2DAA24364 for ; Fri, 12 May 2017 11:45:04 +0000 (UTC) Date: Fri, 12 May 2017 11:45:04 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-6284) Incorrect sorting of completed checkpoints in ZooKeeperCompletedCheckpointStore MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 12 May 2017 11:45:12 -0000 [ https://issues.apache.org/jira/browse/FLINK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008000#comment-16008000 ] ASF GitHub Bot commented on FLINK-6284: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/3881#discussion_r116208844 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java --- @@ -346,17 +346,20 @@ public int exists(String pathInZooKeeper) throws Exception { } else { // Initial cVersion (number of changes to the children of this node) int initialCVersion = stat.getCversion(); - - List children = ZKPaths.getSortedChildren( - client.getZookeeperClient().getZooKeeper(), - ZKPaths.fixForNamespace(client.getNamespace(), "/")); - - for (String path : children) { - path = "/" + path; + List childrenInStr = + client.getZookeeperClient().getZooKeeper(). + getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false); + List children = new ArrayList(childrenInStr.size()); + for(String childNode : childrenInStr) { + children.add(new Long(childNode)); --- End diff -- I'm not sure whether we can assume that the children paths are always longs. In the general case this is not true (see `ZooKeeperMesosWorkerStore`). > Incorrect sorting of completed checkpoints in ZooKeeperCompletedCheckpointStore > ------------------------------------------------------------------------------- > > Key: FLINK-6284 > URL: https://issues.apache.org/jira/browse/FLINK-6284 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Reporter: Xiaogang Shi > Priority: Blocker > Fix For: 1.3.0 > > > Now all completed checkpoints are sorted in their paths when they are recovered in {{ZooKeeperCompletedCheckpointStore}} . In the cases where the latest checkpoint's id is not the largest in lexical order (e.g., "100" is smaller than "99" in lexical order), Flink will not recover from the latest completed checkpoint. > The problem can be easily observed by setting the checkpoint ids in {{ZooKeeperCompletedCheckpointStoreITCase#testRecover()}} to be 99, 100 and 101. > To fix the problem, we should explicitly sort found checkpoints in their checkpoint ids, without the usage of {{ZooKeeperStateHandleStore#getAllSortedByName()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)