hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9703) DistributedHBaseCluster should not throw exceptions, but do a best effort restore
Date Fri, 04 Oct 2013 03:39:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785869#comment-13785869
] 

Hudson commented on HBASE-9703:
-------------------------------

FAILURE: Integrated in hbase-0.96 #122 (See [https://builds.apache.org/job/hbase-0.96/122/])
HBASE-9703 DistributedHBaseCluster should not throw exceptions, but do a best effort restore
(enis: rev 1529046)
* /hbase/branches/0.96/hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java
* /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java


> DistributedHBaseCluster should not throw exceptions, but do a best effort restore
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-9703
>                 URL: https://issues.apache.org/jira/browse/HBASE-9703
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: hbase-9703_v1.patch, hbase-9703_v3.patch
>
>
> At the end of integration tests, we are calling DistributedCluster.restoreCluster() in
case CM has killed nodes so that we can leave the cluster in the same state that we have taken
over. 
> However, if CM is not used in a test (for example ITLoadAndVerify), but some regions
servers die, or an external daemon kills the servers, we will still  try to restore at the
end of the test which may or may not succeed (depending on configuration, the region server
going being unaccessible, etc. )
> We can do two things, either do a best effort restore cluster which will not fail the
test if there are any errors, or we can skip running restore if no disruptive actions have
taken place. 
> I am leaning towards the former one, since if an RS goes down with or w/o CM due to bad
disk etc., we cannot restore the cluster, but we should not fail the test in this case. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message