Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CEFC510883 for ; Thu, 3 Oct 2013 01:06:42 +0000 (UTC) Received: (qmail 28788 invoked by uid 500); 3 Oct 2013 01:06:42 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 28757 invoked by uid 500); 3 Oct 2013 01:06:42 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 28748 invoked by uid 99); 3 Oct 2013 01:06:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Oct 2013 01:06:42 +0000 Date: Thu, 3 Oct 2013 01:06:41 +0000 (UTC) From: "Nick Dimiduk (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9703) DistributedHBaseCluster should not restore the cluster if CM is not used MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784718#comment-13784718 ] Nick Dimiduk commented on HBASE-9703: ------------------------------------- I prefer the former. If the test has taken no destructive actions, it is potentially masking other system issues from the operator. > DistributedHBaseCluster should not restore the cluster if CM is not used > ------------------------------------------------------------------------ > > Key: HBASE-9703 > URL: https://issues.apache.org/jira/browse/HBASE-9703 > Project: HBase > Issue Type: Improvement > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.1 > > > At the end of integration tests, we are calling DistributedCluster.restoreCluster() in case CM has killed nodes so that we can leave the cluster in the same state that we have taken over. > However, if CM is not used in a test (for example ITLoadAndVerify), but some regions servers die, or an external daemon kills the servers, we will still try to restore at the end of the test which may or may not succeed (depending on configuration, the region server going being unaccessible, etc. ) > We can do two things, either do a best effort restore cluster which will not fail the test if there are any errors, or we can skip running restore if no disruptive actions have taken place. > I am leaning towards the former one, since if an RS goes down with or w/o CM due to bad disk etc., we cannot restore the cluster, but we should not fail the test in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)