hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11935) Unbounded creation of Replication Failover workers
Date Wed, 10 Sep 2014 19:08:34 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Purtell updated HBASE-11935:
-----------------------------------
    Attachment: hbase-11935-trunk-v0.patch

Patch for trunk

Replication tests pass locally:
{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hbase.replication.TestMasterReplication
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.946 sec - in org.apache.hadoop.hbase.replication.TestMasterReplication
Running org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.37 sec - in org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
Running org.apache.hadoop.hbase.replication.TestPerTableCFReplication
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.85 sec - in org.apache.hadoop.hbase.replication.TestPerTableCFReplication
Running org.apache.hadoop.hbase.replication.TestReplicationChangingPeerRegionservers
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.904 sec - in org.apache.hadoop.hbase.replication.TestReplicationChangingPeerRegionservers
Running org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.352 sec - in org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer
Running org.apache.hadoop.hbase.replication.TestReplicationEndpoint
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.599 sec - in org.apache.hadoop.hbase.replication.TestReplicationEndpoint
Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.607 sec - in org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.881 sec - in org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
Running org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.436 sec - in org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
Running org.apache.hadoop.hbase.replication.TestReplicationSmallTests
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.888 sec - in org.apache.hadoop.hbase.replication.TestReplicationSmallTests
Running org.apache.hadoop.hbase.replication.TestReplicationSource
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.846 sec - in org.apache.hadoop.hbase.replication.TestReplicationSource
Running org.apache.hadoop.hbase.replication.TestReplicationStateZKImpl
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.465 sec - in org.apache.hadoop.hbase.replication.TestReplicationStateZKImpl
Running org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.017 sec - in org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool
Running org.apache.hadoop.hbase.replication.TestReplicationTrackerZKImpl
Tests run: 4, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 1.738 sec - in org.apache.hadoop.hbase.replication.TestReplicationTrackerZKImpl
Running org.apache.hadoop.hbase.replication.TestReplicationWALEntryFilters
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.122 sec - in org.apache.hadoop.hbase.replication.TestReplicationWALEntryFilters
Running org.apache.hadoop.hbase.replication.TestReplicationWithTags
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.966 sec - in org.apache.hadoop.hbase.replication.TestReplicationWithTags

Results :

Tests run: 40, Failures: 0, Errors: 0, Skipped: 2
{noformat}

> Unbounded creation of Replication Failover workers
> --------------------------------------------------
>
>                 Key: HBASE-11935
>                 URL: https://issues.apache.org/jira/browse/HBASE-11935
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>            Reporter: Lars Hofhansl
>            Assignee: Jesse Yates
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
>         Attachments: hbase-11935-0.98-v0.patch, hbase-11935-trunk-v0.patch
>
>
> We just ran into a production incident with TCP SYN storms on port 2181 (zookeeper).
> In our case the slave cluster was not running. When we bounced the primary cluster we
saw an "unbounded" number of failover threads all hammering the hosts on the slave ZK machines
(which did not run ZK at the time)... Causing overall degradation of network performance between
datacenters.
> Looking at the code we noticed that the thread pool handling of the Failover workers
was probably unintended.
> Patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message