Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 268C818DCD for ; Thu, 17 Dec 2015 21:34:48 +0000 (UTC) Received: (qmail 62029 invoked by uid 500); 17 Dec 2015 21:34:47 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 61851 invoked by uid 500); 17 Dec 2015 21:34:47 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 61799 invoked by uid 99); 17 Dec 2015 21:34:46 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Dec 2015 21:34:46 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B555E2C1F74 for ; Thu, 17 Dec 2015 21:34:46 +0000 (UTC) Date: Thu, 17 Dec 2015 21:34:46 +0000 (UTC) From: "Ashu Pachauri (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-15001) Thread Safety issues in ReplicationSinkManager and HBaseInterClusterReplicationEndpoint MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-15001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashu Pachauri updated HBASE-15001: ---------------------------------- Attachment: HBASE-15001-V0.patch V0: Makes operations in ReplicationSinkManager synchronized and adds a verification on total number of replicated edits in HBaseInterClusterReplicationEndpoint before reporting success. > Thread Safety issues in ReplicationSinkManager and HBaseInterClusterReplicationEndpoint > --------------------------------------------------------------------------------------- > > Key: HBASE-15001 > URL: https://issues.apache.org/jira/browse/HBASE-15001 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 2.0.0, 1.2.0, 1.3.0, 1.2.1 > Reporter: Ashu Pachauri > Assignee: Ashu Pachauri > Priority: Critical > Attachments: HBASE-15001-V0.patch > > > ReplicationSinkManager is not thread-safe. This can cause problems in HBaseInterClusterReplicationEndpoint, when the walprovider is multiwal. > For example: > 1. When multiple threads report bad sinks, the sink list can be non-empty but report a negative size because the ArrayList itself is not thread-safe. > 2. HBaseInterClusterReplicationEndpoint depends on the number of sinks to batch edits for shipping. However, it's quite possible that the following code makes it assume that there are no batches to process (sink size is non-zero, but by the time we reach the "batching" part, sink size becomes zero.) > {code} > if (replicationSinkMgr.getSinks().size() == 0) { > return false; > } > ... > int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1), > replicationSinkMgr.getSinks().size()); > {code} > This is very dangerous, because, assuming no batches to process, we can safely report that we replicated successfully, while we actually did not replicate anything. > The idea is to make all operations in ReplicationSinkManager thread-safe and do a verification on the size of replicated edits before we report success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)