Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2081C10A98 for ; Tue, 30 Jul 2013 09:49:51 +0000 (UTC) Received: (qmail 96352 invoked by uid 500); 30 Jul 2013 09:49:50 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 96055 invoked by uid 500); 30 Jul 2013 09:49:50 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 96035 invoked by uid 99); 30 Jul 2013 09:49:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jul 2013 09:49:49 +0000 Date: Tue, 30 Jul 2013 09:49:49 +0000 (UTC) From: "gautam (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-9085) Integration Tests fails because of bug in teardown phase where the cluster state is not being restored properly. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gautam updated HBASE-9085: -------------------------- Description: I was running the following test over a Distributed Cluster: bin/hbase org.apache.hadoop.hbase.IntegrationTestsDriver IntegrationTestDataIngestSlowDeterministic The IntegrationTestingUtility.restoreCluster() is called in the teardown phase of the test. For a distributed cluster, it ends up calling DistributedHBaseCluster.restoreClusterStatus, which does the task of restoring the cluster back to original state. The restore steps done here, does not solve one specific case: When the initial HBase Master is currently down, and the current HBase Master is different from the initial one. You get into this flow: //check whether current master has changed if (!ServerName.isSameHostnameAndPort(initial.getMaster(), current.getMaster())) { ............. } In the above code path, the current backup masters are stopped, and the current active master is also stopped. At this point, for the aforementioned usecase, none of the Hbase Masters would be available, hence the subsequent attempts to do any operation over the cluster would fail, resulting in Test Failure. was: Let me split this requirement into 2 parts: i) ChaosMonkey I was trying to add more tests around new actions and policies by leveraging the existing classes nested inside ChaosMonkey. But it turned out that some of the classes cannot be used outside, unless we make those visible to the world. Here is an example: I cannot extend ChaosMonkey.Action, as the init(ActionContext context) method has package-wide visibility. There are other places as well which makes it impossible for anyone to extend on top of this hierarchy. ii) LoadTestTool I wanted to extend this tool to define failure/pass criteria based on % of read/write failed, rather than comparing against absolute 0. For that this beautiful class should mark some of its properties usable by its child, by marking those protected. I wanted to get unblocked here first. Once this gets fixed, I think I can take up a JIRA item to refactor these tools, if required. > Integration Tests fails because of bug in teardown phase where the cluster state is not being restored properly. > ---------------------------------------------------------------------------------------------------------------- > > Key: HBASE-9085 > URL: https://issues.apache.org/jira/browse/HBASE-9085 > Project: HBase > Issue Type: Test > Components: test > Affects Versions: 0.95.2 > Reporter: gautam > Assignee: gautam > Fix For: 0.98.0, 0.95.2, 0.94.10 > > > I was running the following test over a Distributed Cluster: > bin/hbase org.apache.hadoop.hbase.IntegrationTestsDriver IntegrationTestDataIngestSlowDeterministic > The IntegrationTestingUtility.restoreCluster() is called in the teardown phase of the test. > For a distributed cluster, it ends up calling DistributedHBaseCluster.restoreClusterStatus, which does the task > of restoring the cluster back to original state. > The restore steps done here, does not solve one specific case: > When the initial HBase Master is currently down, and the current HBase Master is different from the initial one. > You get into this flow: > //check whether current master has changed > if (!ServerName.isSameHostnameAndPort(initial.getMaster(), current.getMaster())) { > ............. > } > In the above code path, the current backup masters are stopped, and the current active master is also stopped. > At this point, for the aforementioned usecase, none of the Hbase Masters would be available, hence the subsequent > attempts to do any operation over the cluster would fail, resulting in Test Failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira