Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 24482185E0 for ; Tue, 30 Jun 2015 14:24:20 +0000 (UTC) Received: (qmail 84859 invoked by uid 500); 30 Jun 2015 14:24:06 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 84804 invoked by uid 500); 30 Jun 2015 14:24:06 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 84616 invoked by uid 99); 30 Jun 2015 14:24:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2015 14:24:05 +0000 Date: Tue, 30 Jun 2015 14:24:05 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12865) WALs may be deleted before they are replicated to peers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608362#comment-14608362 ] Hadoop QA commented on HBASE-12865: ----------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12742805/HBASE-12865-V1.diff against master branch at commit f8bd578b80b4e656d799c82ca1b6191e35bb0ae4. ATTACHMENT ID: 12742805 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14616//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14616//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14616//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14616//console This message is automatically generated. > WALs may be deleted before they are replicated to peers > ------------------------------------------------------- > > Key: HBASE-12865 > URL: https://issues.apache.org/jira/browse/HBASE-12865 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Liu Shaohui > Assignee: He Liangliang > Priority: Critical > Attachments: HBASE-12865-V1.diff > > > By design, ReplicationLogCleaner guarantee that the WALs being in replication queue can't been deleted by the HMaster. The ReplicationLogCleaner gets the WAL set from zookeeper by scanning the replication zk node. But it may get uncompleted WAL set during replication failover for the scan operation is not atomic. > For example: There are three region servers: rs1, rs2, rs3, and peer id 10. The layout of replication zookeeper nodes is: > {code} > /hbase/replication/rs/rs1/10/wals > /rs2/10/wals > /rs3/10/wals > {code} > - t1: the ReplicationLogCleaner finished scanning the replication queue of rs1, and start to scan the queue of rs2. > - t2: region server rs3 is down, and rs1 take over rs3's replication queue. The new layout is > {code} > /hbase/replication/rs/rs1/10/wals > /rs1/10-rs3/wals > /rs2/10/wals > /rs3 > {code} > - t3, the ReplicationLogCleaner finished scanning the queue of rs2, and start to scan the node of rs3. But the the queue has been moved to "replication/rs1/10-rs3/WALS" > So the ReplicationLogCleaner will miss the WALs of rs3 in peer 10 and the hmaster may delete these WALs before they are replicated to peer clusters. > We encountered this problem in our cluster and I think it's a serious bug for replication. > Suggestions are welcomed to fix this bug. thx~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)