Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 779EE11DB0 for ; Wed, 10 Sep 2014 18:51:34 +0000 (UTC) Received: (qmail 73099 invoked by uid 500); 10 Sep 2014 18:51:34 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 73057 invoked by uid 500); 10 Sep 2014 18:51:34 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 73018 invoked by uid 99); 10 Sep 2014 18:51:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 18:51:34 +0000 Date: Wed, 10 Sep 2014 18:51:33 +0000 (UTC) From: "Jesse Yates (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-11935) Unbounded creation of Replication Failover workers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-11935: -------------------------------- Attachment: hbase-11935-0.98-v0.patch Attaching fix for 0.98. Worked up with [~lhofhansl], [~apurtell] and rest of the folks at Salesforce. Not sure if there is a good way to test for this behavior that isn't completely pointless. > Unbounded creation of Replication Failover workers > -------------------------------------------------- > > Key: HBASE-11935 > URL: https://issues.apache.org/jira/browse/HBASE-11935 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Jesse Yates > Priority: Critical > Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1 > > Attachments: hbase-11935-0.98-v0.patch > > > We just ran into a production incident with TCP SYN storms on port 2181 (zookeeper). > In our case the slave cluster was not running. When we bounced the primary cluster we saw an "unbounded" number of failover threads all hammering the hosts on the slave ZK machines (which did not run ZK at the time)... Causing overall degradation of network performance between datacenters. > Looking at the code we noticed that the thread pool handling of the Failover workers was probably unintended. > Patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)