hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11935) ZooKeeper connection storm after queue failover with slave cluster down
Date Fri, 12 Sep 2014 17:48:37 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Purtell updated HBASE-11935:
-----------------------------------
    Attachment:     (was: hbase-11935-0.98-v0.patch)

> ZooKeeper connection storm after queue failover with slave cluster down
> -----------------------------------------------------------------------
>
>                 Key: HBASE-11935
>                 URL: https://issues.apache.org/jira/browse/HBASE-11935
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>            Reporter: Lars Hofhansl
>            Assignee: Jesse Yates
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
>
> We just ran into a production incident with TCP SYN storms on port 2181 (zookeeper).
> In our case the slave cluster was not running. When we bounced the primary cluster we
saw an "unbounded" number of failover threads all hammering the hosts on the slave ZK machines
(which did not run ZK at the time)... Causing overall degradation of network performance between
datacenters.
> Looking at the code we noticed that the thread pool handling of the Failover workers
was probably unintended.
> Patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message