Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EDB918B6B for ; Tue, 8 Dec 2015 22:56:13 +0000 (UTC) Received: (qmail 4976 invoked by uid 500); 8 Dec 2015 22:56:11 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 4801 invoked by uid 500); 8 Dec 2015 22:56:11 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 4213 invoked by uid 99); 8 Dec 2015 22:56:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Dec 2015 22:56:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 3241B2C1F79 for ; Tue, 8 Dec 2015 22:56:11 +0000 (UTC) Date: Tue, 8 Dec 2015 22:56:11 +0000 (UTC) From: "Ashu Pachauri (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-14953) HBaseInterClusterReplicationEndpoint: Do not retry the whole batch of edits in case of RejectedExecutionException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Ashu Pachauri created HBASE-14953: ------------------------------------- Summary: HBaseInterClusterReplicationEndpoint: Do not retry the whole batch of edits in case of RejectedExecutionException Key: HBASE-14953 URL: https://issues.apache.org/jira/browse/HBASE-14953 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Ashu Pachauri Assignee: Ashu Pachauri When we have wal provider set to multiwal, the ReplicationSource has multiple worker threads submitting batches to HBaseInterClusterReplicationEndpoint. In such a scenario, it is quite common to encounter RejectedExecutionException because it takes quite long for shipping edits to peer cluster compared to reading edits from source and submitting more batches to the endpoint. The logs are just filled with warnings due to this very exception. Since we subdivide batches before actually shipping them, we don't need to fail and resend the whole batch if one of the sub-batches fails with RejectedExecutionException. Rather, we should just retry the failed sub-batches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)