hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benoit Sigoure (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2849) Clients stuck in loop doing "NIOServerCnxn: Closed socket connection"
Date Fri, 23 Jul 2010 23:02:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891848#action_12891848
] 

Benoit Sigoure commented on HBASE-2849:
---------------------------------------

http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/org/apache/zookeeper/ZooKeeper.html
bq. If for some reason, the client fails to send heart beats to the server for a prolonged
period of time (exceeding the sessionTimeout value, for instance), the server will expire
the session, and the session ID will become invalid. The client object will no longer be usable.
To make ZooKeeper API calls, the application must create a new client object.

So apparently, a new {{ZooKeeper}} object must be created when the session becomes invalid.
 This sounds like a bad API, not sure why they did it this way.  In HBase's source code, it
seems that the only thing that creates a {{ZooKeeper}} instance is in {{ZooKeeperWrapper#reconnectToZk}}.
 This method, although it's public, is only called from 3 other methods in that class: the
constructor, {{exists}} and {{deleteUnassignedRegion}}.  The latter, {{deleteUnassignedRegion}},
is only used by the master.  The former, {{exists}}, is only called from the following locations:
* {{ZKUnassignedWatcher}}'s constructor.  This is only used in the master.
* {{RSZookeeperUpdater#startRegionCloseEvent}}.  This is only used in the region server.
* {{ZooKeeperWrapper#createOrUpdateUnassignedRegion}}.  This is only used by the master's
{{RegionManager}}.
* {{ZooKeeperWrapper#createUnassignedRegion}} and {{ZooKeeperWrapper#updateUnassignedRegion}}.
 Those two methods, even though they're public, are only called from {{ZooKeeperWrapper#createOrUpdateUnassignedRegion}},
which itself is only used by the master's {{RegionManager}}.

In other words, for someone writing an HBase application, only a single {{ZooKeeper}} instance
gets created when the {{ZooKeeperWrapper}} is instantiated.  Any failure that causes the client's
session to become invalid will is unrecoverable with the current code and the client has to
be killed and restarted.

Jonathan, is the work being done for the master rewrite branch going to address this issue?
 Bear in mind that here I'm concerned about HBase *client* applications.

> Clients stuck in loop doing "NIOServerCnxn: Closed socket connection"
> ---------------------------------------------------------------------
>
>                 Key: HBASE-2849
>                 URL: https://issues.apache.org/jira/browse/HBASE-2849
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.0
>
>
> Someone made mention of this loop last week but I don't think I filed an issue.  Here
is another instance, again from a secret hbase admirer:
> "It seems that when Zookeeper dies and restarts, all client applications need to be restarted
too. I just restarted HBase in non-distributed mode (which includes a ZK) and now my application
can't reconnect to ZK unless I restart it too.  I'm stuck in this loop:
> {code}
> 2010-07-19 00:13:05,725 INFO org.apache.zookeeper.server.NIOServerCnxn:
>   Closed socket connection for client /127.0.0.1:55153 (no session established for client)
> 2010-07-19 00:13:07,052 INFO org.apache.zookeeper.server.NIOServerCnxn:
>   Accepted socket connection from /127.0.0.1:55154
> 2010-07-19 00:13:07,053 INFO org.apache.zookeeper.server.NIOServerCnxn:
>   Refusing session request for client /127.0.0.1:55154 as it has seen zxid 0xf5 our last
zxid is 0xd7
>   client must try another server
> {code}
> "

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message