hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12072) We are doing 35 x 35 retries for master operations
Date Thu, 25 Sep 2014 04:34:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147355#comment-14147355
] 

Ted Yu commented on HBASE-12072:
--------------------------------

Dig a bit more.
Looks like limiting the change to HBaseAdmin would be better.
Currently connection.getKeepAliveMasterService() is called inside MasterCallable#prepare()
- meaning it would be called for each retry of RpcRetryingCaller#callWithRetries().

One solution is to move call of connection.getKeepAliveMasterService() inside the ctor of
HBaseAdmin#MasterCallable.

> We are doing 35 x 35 retries for master operations
> --------------------------------------------------
>
>                 Key: HBASE-12072
>                 URL: https://issues.apache.org/jira/browse/HBASE-12072
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Ted Yu
>         Attachments: 12072-v1.txt
>
>
> For master requests, there are two retry mechanisms in effect. The first one is from
HBaseAdmin.executeCallable() 
> {code}
>   private <V> V executeCallable(MasterCallable<V> callable) throws IOException
{
>     RpcRetryingCaller<V> caller = rpcCallerFactory.newCaller();
>     try {
>       return caller.callWithRetries(callable);
>     } finally {
>       callable.close();
>     }
>   }
> {code}
> And inside, the other one is from StubMaker.makeStub():
> {code}
> /**
>        * Create a stub against the master.  Retry if necessary.
>        * @return A stub to do <code>intf</code> against the master
>        * @throws MasterNotRunningException
>        */
>       @edu.umd.cs.findbugs.annotations.SuppressWarnings (value="SWL_SLEEP_WITH_LOCK_HELD")
>       Object makeStub() throws MasterNotRunningException {
> {code}
> The tests will just hang for 10 min * 35 ~= 6hours. 
> {code}
> 2014-09-23 16:19:05,151 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 1 of 35 failed; retrying after sleep of 100, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,253 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 2 of 35 failed; retrying after sleep of 200, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,456 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 3 of 35 failed; retrying after sleep of 300, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,759 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 4 of 35 failed; retrying after sleep of 500, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:06,262 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 5 of 35 failed; retrying after sleep of 1008, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:07,273 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 6 of 35 failed; retrying after sleep of 2011, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:09,286 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 7 of 35 failed; retrying after sleep of 4012, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:13,303 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 8 of 35 failed; retrying after sleep of 10033, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:23,343 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 9 of 35 failed; retrying after sleep of 10089, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:33,439 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 10 of 35 failed; retrying after sleep of 10027, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:43,473 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 11 of 35 failed; retrying after sleep of 10004, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:53,485 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 12 of 35 failed; retrying after sleep of 20160, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:20:13,656 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 13 of 35 failed; retrying after sleep of 20006, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:20:33,675 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 14 of 35 failed; retrying after sleep of 20076, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:20:53,762 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 15 of 35 failed; retrying after sleep of 20077, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:21:13,852 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 16 of 35 failed; retrying after sleep of 20103, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:21:33,967 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 17 of 35 failed; retrying after sleep of 20136, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:21:54,115 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 18 of 35 failed; retrying after sleep of 20147, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:22:14,274 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 19 of 35 failed; retrying after sleep of 20131, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:22:34,417 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 20 of 35 failed; retrying after sleep of 20171, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:22:54,601 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 21 of 35 failed; retrying after sleep of 20177, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:23:14,790 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 22 of 35 failed; retrying after sleep of 20193, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:23:34,996 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 23 of 35 failed; retrying after sleep of 20195, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:23:55,203 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 24 of 35 failed; retrying after sleep of 20107, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:24:15,322 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 25 of 35 failed; retrying after sleep of 20186, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:24:35,520 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 26 of 35 failed; retrying after sleep of 20106, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:24:55,638 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 27 of 35 failed; retrying after sleep of 20173, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:25:15,824 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 28 of 35 failed; retrying after sleep of 20136, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:25:35,973 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 29 of 35 failed; retrying after sleep of 20188, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:25:56,174 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 30 of 35 failed; retrying after sleep of 20144, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:26:16,330 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 31 of 35 failed; retrying after sleep of 20106, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:26:36,448 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 32 of 35 failed; retrying after sleep of 20003, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:26:56,463 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 33 of 35 failed; retrying after sleep of 20114, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:16,590 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 34 of 35 failed; retrying after sleep of 20154, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:36,756 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 35 of 35 failed; no more retrying.
> java.io.IOException: Can't get master address from ZooKeeper; znode data == null
> 	at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:114)
> 	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1554)
> 	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1599)
> 	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1653)
> 	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1860)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin$MasterCallable.prepare(HBaseAdmin.java:3359)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:122)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:92)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3386)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:2201)
> 	at org.apache.hadoop.hbase.DistributedHBaseCluster.getClusterStatus(DistributedHBaseCluster.java:74)
> 	at org.apache.hadoop.hbase.DistributedHBaseCluster.<init>(DistributedHBaseCluster.java:57)
> 	at org.apache.hadoop.hbase.IntegrationTestingUtility.createDistributedHBaseCluster(IntegrationTestingUtility.java:140)
> 	at org.apache.hadoop.hbase.IntegrationTestingUtility.initializeCluster(IntegrationTestingUtility.java:75)
> 	at org.apache.hadoop.hbase.IntegrationTestManyRegions.setUp(IntegrationTestManyRegions.java:80)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> 	at org.junit.runners.Suite.runChild(Suite.java:127)
> 	at org.junit.runners.Suite.runChild(Suite.java:26)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
> 	at org.apache.hadoop.hbase.IntegrationTestsDriver.doWork(IntegrationTestsDriver.java:110)
> 	at org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> 	at org.apache.hadoop.hbase.IntegrationTestsDriver.main(IntegrationTestsDriver.java:46)
> 2014-09-23 16:27:37,061 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 1 of 35 failed; retrying after sleep of 100, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:37,163 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 2 of 35 failed; retrying after sleep of 200, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:37,365 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 3 of 35 failed; retrying after sleep of 301, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:37,669 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 4 of 35 failed; retrying after sleep of 504, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:38,176 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 5 of 35 failed; retrying after sleep of 1008, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:39,185 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 6 of 35 failed; retrying after sleep of 2018, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:41,207 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 7 of 35 failed; retrying after sleep of 4019, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:45,231 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 8 of 35 failed; retrying after sleep of 10004, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:27:55,241 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 9 of 35 failed; retrying after sleep of 10005, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:28:05,253 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 10 of 35 failed; retrying after sleep of 10099, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:28:15,359 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 11 of 35 failed; retrying after sleep of 10059, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:28:25,425 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 12 of 35 failed; retrying after sleep of 20069, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:28:45,507 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 13 of 35 failed; retrying after sleep of 20006, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:29:05,525 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 14 of 35 failed; retrying after sleep of 20186, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:29:25,723 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 15 of 35 failed; retrying after sleep of 20080, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:29:45,814 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 16 of 35 failed; retrying after sleep of 20001, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:30:05,826 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 17 of 35 failed; retrying after sleep of 20019, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:30:25,857 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 18 of 35 failed; retrying after sleep of 20159, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:30:46,028 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 19 of 35 failed; retrying after sleep of 20170, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:31:06,211 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 20 of 35 failed; retrying after sleep of 20146, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:31:26,368 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 21 of 35 failed; retrying after sleep of 20138, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:31:46,518 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 22 of 35 failed; retrying after sleep of 20140, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:32:06,670 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 23 of 35 failed; retrying after sleep of 20196, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:32:26,878 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 24 of 35 failed; retrying after sleep of 20123, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:32:47,013 INFO  [main] client.ConnectionManager$HConnectionImplementation:
getMaster attempt 25 of 35 failed; retrying after sleep of 20033, exception=java.io.IOException:
Can't get master address from ZooKeeper; znode data == null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message