hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
Date Sat, 08 Nov 2014 01:17:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203087#comment-14203087
] 

Enis Soztutar commented on HBASE-12450:
---------------------------------------

Admin.balancer() may throw some other exception than ServiceException (see HBASE-12072). So
we should just catch Exception there. Other than that looks good. 

> Unbalance chaos monkey might kill all region servers without starting them back
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-12450
>                 URL: https://issues.apache.org/jira/browse/HBASE-12450
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>            Priority: Minor
>             Fix For: 2.0.0, 0.98.8, 0.99.2
>
>         Attachments: HBASE-12450-0.98.patch, HBASE-12450.patch, HBASE-12450.patch
>
>
> UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers.
But if the balance fails exception is thrown causing the region servers to not start. For
me, the balance always kept on failing with socket timeout (default 1 min) as master runs
one iteration of balance for 5 mins (default config). Eventually all servers are killed but
never started back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message