hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters
Date Sat, 10 Sep 2011 17:06:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102083#comment-13102083

Jean-Daniel Cryans commented on HBASE-3130:

Here it goes:

2011-09-09 19:44:28,224 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server serverName=sv4r17s40,60020,1313587209632, load=(requests=4292, regions=186,
usedHeap=11929, maxHeap=24749): connection to cluster: 5-0x130d4937f890066 connection to cluster:
5-0x130d4937f890066 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)


As you can see it's pretty generic, I could trace it was the peer connection with the "connection
to cluster". Moreover the fix will take place around ReplicationPeer which contains a ZKW
that requires an Abortable which, at the moment, is the RS itself. Instead we should pass
our own, or maybe ReplicationSource should implement it.

> [replication] ReplicationSource can't recover from session expired on remote clusters
> -------------------------------------------------------------------------------------
>                 Key: HBASE-3130
>                 URL: https://issues.apache.org/jira/browse/HBASE-3130
>             Project: HBase
>          Issue Type: Bug
>          Components: replication
>            Reporter: Jean-Daniel Cryans
> Currently ReplicationSource cannot recover when its zookeeper connection to its remote
cluster expires. HLogs are still being tracked, but a cluster restart is required to continue
replication (or a rolling restart).

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message