hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7009) Port HBaseCluster interface/tests to 0.94
Date Tue, 06 Nov 2012 00:52:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491099#comment-13491099
] 

Enis Soztutar commented on HBASE-7009:
--------------------------------------

@Jimmy, are you referring to MiniHBaseCluster.getClientProtocol(). Is there any reason, why
adding this would break BC? 

Tested the patch: 
{code}
[root@ip-10-191-190-58 hbase]# bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey 
12/11/05 19:13:37 INFO util.ChaosMonkey: Sleeping for 17573 to add jitter
12/11/05 19:13:55 INFO util.ChaosMonkey: Performing action: Restart random region server
12/11/05 19:13:55 INFO util.ChaosMonkey: Killing region server:ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.HBaseCluster: Aborting RS: ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver
| grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:13:55 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
12/11/05 19:13:55 INFO hbase.HBaseCluster: Waiting service:regionserver to stop: ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver
| grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:13:55 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
12/11/05 19:13:55 INFO util.ChaosMonkey: Killed region server:ip-10-72-242-62.ec2.internal,60020,1352160397949.
Reported num of rs:2
12/11/05 19:13:55 INFO util.ChaosMonkey: Sleeping for:5000
12/11/05 19:14:00 INFO util.ChaosMonkey: Starting region server:ip-10-72-242-62.ec2.internal
12/11/05 19:14:00 INFO hbase.HBaseCluster: Starting RS on: ip-10-72-242-62.ec2.internal
12/11/05 19:14:00 INFO hbase.ClusterManager: Executing remote command: /root/hbase/bin/../bin/hbase-daemon.sh
--config /root/hbase/bin/../conf start regionserver , hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:14:02 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting
regionserver, logging to /var/log/hbase/hbase-root-regionserver-ip-10-72-242-62.out

12/11/05 19:14:02 INFO util.ChaosMonkey: Started region server:ip-10-72-242-62.ec2.internal,60020,1352160397949.
Reported num of rs:2
....
{code}

The only problem is when master is restarted, HConnection does not seem to pick up the new
master:
{code}
12/11/05 19:26:00 INFO util.ChaosMonkey: Killed master server:ip-10-191-190-58.ec2.internal,60000,1352160574752
12/11/05 19:26:00 INFO util.ChaosMonkey: Sleeping for:5000
12/11/05 19:26:05 INFO util.ChaosMonkey: Starting master:ip-10-191-190-58.ec2.internal
12/11/05 19:26:05 INFO hbase.HBaseCluster: Starting Master on: ip-10-191-190-58.ec2.internal
12/11/05 19:26:05 INFO hbase.ClusterManager: Executing remote command: /root/hbase/bin/../bin/hbase-daemon.sh
--config /root/hbase/bin/../conf start master , hostname:ip-10-191-190-58.ec2.internal
12/11/05 19:26:06 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting
master, logging to /var/log/hbase/hbase-root-master-ip-10-191-190-58.out

12/11/05 19:26:06 INFO client.HConnectionManager$HConnectionImplementation: Exception contacting
master. Retrying...
java.io.IOException: Call to ip-10-191-190-58.ec2.internal/10.191.190.58:60000 failed on local
exception: java.io.EOFException
12/11/05 19:27:06 WARN hbase.HBaseCluster: Master not started yet org.apache.hadoop.hbase.MasterNotRunningException
12/11/05 19:27:07 INFO util.ChaosMonkey: Started master: ip-10-191-190-58.ec2.internal,60000,1352160574752
12/11/05 19:27:07 INFO util.ChaosMonkey: Performing action: Batch restarting 50% of region
servers
12/11/05 19:27:07 WARN util.ChaosMonkey: Exception occured during performing action: org.apache.hadoop.hbase.MasterNotRunningException
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:713)
	at org.apache.hadoop.hbase.client.HBaseAdmin.getMaster(HBaseAdmin.java:213)
	at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:1632)
	at org.apache.hadoop.hbase.DistributedHBaseCluster.getClusterStatus(DistributedHBaseCluster.java:68)
	at org.apache.hadoop.hbase.util.ChaosMonkey$Action.getCurrentServers(ChaosMonkey.java:141)
	at org.apache.hadoop.hbase.util.ChaosMonkey$BatchRestartRs.perform(ChaosMonkey.java:277)
	at org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy.run(ChaosMonkey.java:393)
	at java.lang.Thread.run(Thread.java:662)
{code}

Not sure whether there is a problem in the backported patch, or in 0.94.3 itself. Investigating
now. 
                
> Port HBaseCluster interface/tests to 0.94
> -----------------------------------------
>
>                 Key: HBASE-7009
>                 URL: https://issues.apache.org/jira/browse/HBASE-7009
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>    Affects Versions: 0.94.3
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>             Fix For: 0.94.4
>
>         Attachments: HBASE-7009-p1.patch, HBASE-7009.patch, HBASE-7009-v2-squashed.patch
>
>
> Need to port. I am porting V5 patch from the original JIRA; I have a partially ported
(V3) patch from Enis with protocol buffers being reverted to HRegionInterface/HMasterInterface

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message