Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 900E410EF3 for ; Fri, 9 Aug 2013 02:28:49 +0000 (UTC) Received: (qmail 76024 invoked by uid 500); 9 Aug 2013 02:28:48 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 75997 invoked by uid 500); 9 Aug 2013 02:28:48 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 75988 invoked by uid 99); 9 Aug 2013 02:28:48 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Aug 2013 02:28:48 +0000 Date: Fri, 9 Aug 2013 02:28:48 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9158) Serious bug in cyclic replication MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734338#comment-13734338 ] Hudson commented on HBASE-9158: ------------------------------- FAILURE: Integrated in hbase-0.95 #419 (See [https://builds.apache.org/job/hbase-0.95/419/]) HBASE-9158 Serious bug in cyclic replication (larsh: rev 1512090) * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java > Serious bug in cyclic replication > --------------------------------- > > Key: HBASE-9158 > URL: https://issues.apache.org/jira/browse/HBASE-9158 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.0, 0.95.1, 0.94.10 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Priority: Critical > Fix For: 0.98.0, 0.95.2, 0.94.11 > > Attachments: 9158-0.94.txt, 9158-0.94-v2.txt, 9158-0.94-v3.txt, 9158-0.94-v4.txt, 9158-trunk-v1.txt, 9158-trunk-v2.txt, 9158-trunk-v3.txt, 9158-trunk-v4.txt > > > While studying the code for HBASE-7709, I found a serious bug in the current cyclic replication code. The problem is here in HRegion.doMiniBatchMutation: > {code} > Mutation first = batchOp.operations[firstIndex].getFirst(); > txid = this.log.appendNoSync(regionInfo, this.htableDescriptor.getName(), > walEdit, first.getClusterId(), now, this.htableDescriptor); > {code} > Now note that edits replicated from remote cluster and local edits might interleave in the WAL, we might also receive edit from multiple remote clusters. Hence that might have edits from many clusters in it, but all are just labeled with the clusterId of the first Mutation. > Fixing this in doMiniBatchMutation seems tricky to do efficiently (imagine we get a batch with cluster1, cluster2, cluster1, cluster2, ..., in that case each edit would have to be its own batch). The coprocessor handling would also be difficult. > The other option is create batches of Puts grouped by the cluster id in ReplicationSink.replicateEntries(...), this is not as general, but equally correct. This is the approach I would favor. > Lastly this is very hard to verify in a unittest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira