hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
Date Thu, 03 May 2012 19:50:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267743#comment-13267743
] 

stack commented on HBASE-5883:
------------------------------

Can't we at least check the message to ensure its what we expect?  (See the second catch below
where we look for "connection reset").  Can we be sure what comes up here is the ConnectException
we set down in HBaseRPC?

{code}
+      if (ioe instanceof ConnectException) {
+        // Catch. Connect refused.
{code}

This redoing of an exception seems problematic.  Its really necessary?

{code}
+        } else if (ioex.getMessage().toLowerCase()
+            .contains("connection refused")) {
+          ce = new ConnectException(ioex.getMessage());
+          ioe = ce;
{code}

I'd feel better about this fix if we could figure where the exception came from (Its not from
the rpc stringifying of exceptions to pass them from server to client?
                
> Backup master is going down due to connection refused exception
> ---------------------------------------------------------------
>
>                 Key: HBASE-5883
>                 URL: https://issues.apache.org/jira/browse/HBASE-5883
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1, 0.94.0
>            Reporter: Gopinathan A
>            Assignee: Jieshan Bean
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch
>
>
> The active master node network was down for some time (This node contains Master,DN,ZK,RS).
Here backup node got 
> notification, and started to became active. Immedietly backup node got aborted with the
below exception.
> {noformat}
> 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished
splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting]
in 26374ms
> 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort:
loaded coprocessors are: []
> 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception.
Starting shutdown.
> java.io.IOException: java.net.ConnectException: Connection refused
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
> 	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> 	at $Proxy13.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660)
> 	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
> 	... 20 more
> 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service
threads
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message