hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dima Spivak (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-12852) Tests from hbase-it that use ChaosMonkey don't fail if SSH commands fail
Date Wed, 14 Jan 2015 07:50:35 GMT
Dima Spivak created HBASE-12852:
-----------------------------------

             Summary: Tests from hbase-it that use ChaosMonkey don't fail if SSH commands
fail
                 Key: HBASE-12852
                 URL: https://issues.apache.org/jira/browse/HBASE-12852
             Project: HBase
          Issue Type: Bug
          Components: integration tests
    Affects Versions: 0.98.6
            Reporter: Dima Spivak
            Assignee: Dima Spivak


I've just started rolling my sleeves up and playing about with hbase-it (at the moment, only
on 0.98.6), but wanted to begin filing JIRAs for issues I encounter so that I don't forget
to get to them. First up is the fact that it seems that tests run with ChaosMonkey don't fail
when the ChaosMonkey fails to work. As an example, while running IntegrationTestIngest with
a slowDeterministic CM, I forgot to set up SSH properly and saw the following:
{code}
15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep proc_regionserver
| grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:node-5.internal
failed at attempt 4. Retrying until maxAttempts: 5. Exception: stderr: Permission denied,
please try again.
Permission denied, please try again.
Permission denied (publickey,password).
, stdout: 
15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4...
15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed
15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541
15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down
Failed to write keys: 0
Key range: [150000..159999]
Batch updates: false
Percent of keys to update: 60
Updater threads: 10
Ignore nonce conflicts: true
Regions per server: 5
15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data...
Starting to mutate data...
15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816
15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K, time=00:00:05
Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94, latency=102 ms], wroteUpTo=149999
15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0 K, time=00:00:10
Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87, latency=77 ms], wroteUpTo=149999
15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux | grep proc_regionserver
| grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:node-5.internal
15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh  node-5.internal "ps
aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL"]
15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing action: ExitCodeException
exitCode=255: stderr: Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
, stdout: 
	at org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208)
	at org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223)
	at org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268)
	at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97)
	at org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110)
	at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84)
	at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50)
	at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
	at org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
	at org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
	at org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
	at java.lang.Thread.run(Thread.java:745)
{code}

Seems to me that tests should fail in these instances rather than just toss a warning. Was
this just an oversight, [~enis] and [~ndimiduk], or is this by design?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message