hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Kutner <e...@gigya.com>
Subject Region server crashes when using replication
Date Tue, 22 Mar 2011 17:39:10 GMT
Hi,
I'm trying to use replication between two HBase clusters and I'm
encountering all kinds of crashes and weird behavior.

First, it seems that starting a region server when the peer ZKs are
not available will cause the server to fail to start:

2011-03-22 08:31:56,647 INFO
org.apache.hadoop.hbase.replication.ReplicationZookeeper: Replication
is now started
2011-03-22 08:31:56,668 WARN
org.apache.hadoop.hbase.zookeeper.ZKConfig:
java.net.UnknownHostException: haddop2-zk3
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:850)
        at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1201)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1154)
        at java.net.InetAddress.getAllByName(InetAddress.java:1084)
        at java.net.InetAddress.getAllByName(InetAddress.java:1020)
        at java.net.InetAddress.getByName(InetAddress.java:970)
        at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:206)
        at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:250)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:113)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:288)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:253)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:182)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.<init>(ReplicationZookeeper.java:142)
        at org.apache.hadoop.hbase.replication.regionserver.Replication.<init>(Replication.java:75)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1092)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:875)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1472)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:563)
        at java.lang.Thread.run(Thread.java:662)

2011-03-22 08:31:56,669 WARN
org.apache.hadoop.hbase.zookeeper.ZKConfig:
java.net.UnknownHostException: haddop2-zk2
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:850)
        at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1201)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1154)
        at java.net.InetAddress.getAllByName(InetAddress.java:1084)
        at java.net.InetAddress.getAllByName(InetAddress.java:1020)
        at java.net.InetAddress.getByName(InetAddress.java:970)
        at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:206)
        at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:250)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:113)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:288)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:253)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:182)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.<init>(ReplicationZookeeper.java:142)
        at org.apache.hadoop.hbase.replication.regionserver.Replication.<init>(Replication.java:75)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1092)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:875)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1472)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:563)
        at java.lang.Thread.run(Thread.java:662)

2011-03-22 08:31:56,669 INFO org.apache.zookeeper.ZooKeeper:
Initiating client connection,
connectString=haddop2-zk3:2181,haddop2-zk2:2181,hadoop2-zk1:2181
sessionTimeout=180000 watcher=connection to cluster: 1
2011-03-22 08:31:56,670 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Failed
initialization
2011-03-22 08:31:56,670 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init
java.net.UnknownHostException: haddop2-zk3
        at java.net.InetAddress.getAllByName0(InetAddress.java:1158)
        at java.net.InetAddress.getAllByName(InetAddress.java:1084)
        at java.net.InetAddress.getAllByName(InetAddress.java:1020)
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:386)
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:331)
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:377)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:119)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:288)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:253)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:182)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.<init>(ReplicationZookeeper.java:142)
        at org.apache.hadoop.hbase.replication.regionserver.Replication.<init>(Replication.java:75)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1092)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:875)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1472)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:563)
        at java.lang.Thread.run(Thread.java:662)
2011-03-22 08:31:56,675 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
server serverName=hadoop1-s05.farm-ny.gigya.com,60020,1300797113247,
load=(requests=0, regions=0, usedHeap=24, maxHeap=987): Unhandled
exception: haddop2-zk3
java.net.UnknownHostException: haddop2-zk3
        at java.net.InetAddress.getAllByName0(InetAddress.java:1158)
        at java.net.InetAddress.getAllByName(InetAddress.java:1084)
        at java.net.InetAddress.getAllByName(InetAddress.java:1020)
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:386)
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:331)
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:377)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:119)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:288)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:253)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:182)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.<init>(ReplicationZookeeper.java:142)
        at org.apache.hadoop.hbase.replication.regionserver.Replication.<init>(Replication.java:75)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1092)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:875)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1472)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:563)
        at java.lang.Thread.run(Thread.java:662)
2011-03-22 08:31:56,675 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled
exception: haddop2-zk3
2011-03-22 08:31:56,675 INFO org.apache.hadoop.ipc.HBaseServer:
Stopping server on 60020
2011-03-22 08:31:56,679 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Allocating
LruBlockCache with maximum size 197.5m
2011-03-22 08:31:56,683 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server
at: hadoop1-s05.farm-ny.gigya.com,60020,1300797113247
2011-03-22 08:31:56,683 DEBUG
org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog
tracker org.apache.hadoop.hbase.catalog.CatalogTracker@508aeb74
2011-03-22 08:31:56,684 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing
leases
2011-03-22 08:31:56,684 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed
leases
2011-03-22 08:31:56,684 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Closed zookeeper sessionid=0x22e669588a20058
2011-03-22 08:31:56,692 INFO org.apache.zookeeper.ZooKeeper: Session:
0x22e669588a20058 closed
2011-03-22 08:31:56,692 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-22 08:31:56,700 INFO org.apache.zookeeper.ZooKeeper: Session:
0x12e669588b8004d closed
2011-03-22 08:31:56,700 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-22 08:31:56,702 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-15,5,main]
2011-03-22 08:31:56,702 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown
hook
2011-03-22 08:31:56,702 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs
shutdown hook thread.
2011-03-22 08:31:56,804 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
finished.


Second, it seems that when I'm shutting down a region server on the
peer cluster region servers on the source cluster connect to it are
also shutting down:
2011-03-22 09:03:34,541 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing
leases
2011-03-22 09:03:34,541 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed
leases
2011-03-22 09:03:34,644 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Closed zookeeper sessionid=0x12e669588b80050
2011-03-22 09:03:34,653 INFO org.apache.zookeeper.ZooKeeper: Session:
0x12e669588b80050 closed
2011-03-22 09:03:34,653 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-22 09:03:34,662 INFO org.apache.zookeeper.ZooKeeper: Session:
0x22e669588a2005d closed
2011-03-22 09:03:34,662 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-22 09:03:34,664 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Closing source 1 because: Region server is closing
2011-03-22 09:03:39,377 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Source exiting 1
2011-03-22 09:03:39,431 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Source exiting 1
2011-03-22 09:03:39,431 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
exiting
2011-03-22 09:03:39,432 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-15,5,main]
2011-03-22 09:03:39,432 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown
hook
2011-03-22 09:03:39,432 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs
shutdown hook thread.
2011-03-22 09:03:39,433 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
finished.



Third, sometimes it crashes without any reason I can understand. See
the attached log dump. It includes the entire load process from start
to shutdown of the region server. When I configure "stop_replication"
everything is OK, here's what happens after "start_replication":
2011-03-22 09:38:59,199 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Replication is disabled, sleeping 1000 times 10
2011-03-22 09:38:59,333 INFO
org.apache.hadoop.hbase.replication.ReplicationZookeeper: Replication
is now started
2011-03-22 09:39:09,202 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Opening log for replication
hadoop1-s05.farm-ny.gigya.com%3A60020.1300799921876 at 124
2011-03-22 09:39:09,215 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
currentNbOperations:0 and seenEntries:1 and size: 191
2011-03-22 09:39:09,215 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
Going to report log
#hadoop1-s05.farm-ny.gigya.com%3A60020.1300799921876 for position 315
in hdfs://hadoop1-m1:8020/hbase/.logs/hadoop1-s05.farm-ny.gigya.com,60020,1300799918373/hadoop1-s05.farm-ny.gigya.com%3A60020.1300799921876
2011-03-22 09:39:09,224 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
server serverName=hadoop1-s05.farm-ny.gigya.com,60020,1300799918373,
load=(requests=0, regions=3, usedHeap=41, maxHeap=987): Writing
replication status
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /hbase/replication/rs/hadoop1-s05.farm-ny.gigya.com,60020,1300799918373/1/hadoop1-s05.farm-ny.gigya.com%3A60020.1300799921876
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:708)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:751)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:432)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:131)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332)
2011-03-22 09:39:09,225 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=3, stores=5, storefiles=5, storefileIndexSize=1,
memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=41,
maxHeap=987, blockCacheSize=1702768, blockCacheFree=205390992,
blockCacheCount=3, blockCacheHitCount=15, blockCacheMissCount=3,
blockCacheEvictedCount=0, blockCacheHitRatio=83,
blockCacheHitCachingRatio=83
2011-03-22 09:39:09,225 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Writing
replication status
2011-03-22 09:39:09,225 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
Removing 0 logs in the list: []
2011-03-22 09:39:09,225 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Nothing to replicate, sleeping 1000 times 10
2011-03-22 09:39:10,996 INFO org.apache.hadoop.ipc.HBaseServer:
Stopping server on 60020
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 0 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 2 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 18 on 60020: exiting
2011-03-22 09:39:10,998 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
infoServer
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 30 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 5 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer:
Stopping IPC Server Responder
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 4 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 6 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 20 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 15 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 9 on 60020: exiting
2011-03-22 09:39:10,999 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 21 on 60020: exiting
2011-03-22 09:39:10,999 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 49 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 14 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 8 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 17 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 44 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 8 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 42 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 41 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 40 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 39 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 38 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 37 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 36 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 34 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 33 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 32 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 31 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 29 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 28 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 27 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 26 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 25 on 60020: exiting
2011-03-22 09:39:11,001 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 24 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 23 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 13 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 7 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 16 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 12 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 6 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 11 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 5 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 4 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 0 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 9 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 7 on 60020: exiting
2011-03-22 09:39:10,997 INFO org.apache.hadoop.ipc.HBaseServer:
Stopping IPC Server listener on 60020
2011-03-22 09:39:11,004 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:60030
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 35 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 43 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 45 on 60020: exiting
2011-03-22 09:39:11,000 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 46 on 60020: exiting
2011-03-22 09:39:10,999 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 47 on 60020: exiting
2011-03-22 09:39:10,999 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 48 on 60020: exiting
2011-03-22 09:39:11,010 INFO
org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
2011-03-22 09:39:11,010 DEBUG
org.apache.hadoop.hbase.regionserver.wal.HLog:
regionserver60020.logSyncer interrupted while waiting for sync
requests
2011-03-22 09:39:11,010 INFO
org.apache.hadoop.hbase.regionserver.wal.HLog:
regionserver60020.logSyncer exiting
2011-03-22 09:39:11,010 DEBUG
org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer in
hdfs://hadoop1-m1:8020/hbase/.logs/hadoop1-s05.farm-ny.gigya.com,60020,1300799918373
2011-03-22 09:39:10,999 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 22 on 60020: exiting
2011-03-22 09:39:10,999 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 10 on 60020: exiting
2011-03-22 09:39:10,999 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 3 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 3 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 2 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: PRI
IPC Server handler 1 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 19 on 60020: exiting
2011-03-22 09:39:10,998 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 1 on 60020: exiting
2011-03-22 09:39:11,011 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Processing close of -ROOT-,,0.70236052
2011-03-22 09:39:11,011 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
regionserver60020.cacheFlusher exiting
2011-03-22 09:39:11,010 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Processing close of
TEST_Msg,MZVWNHOSUZYUOYNQKDIAVSCQEPHXVWVXIMGLGXGSSXQTZQMOZCZDCQAUWFSXARWYEBMBRCJMXPHXBIQNDTYTWRURMMOBFISBBSPYEKWWSNGMJCSOPFUGTDBMGUPFOIHOXGWI\x00,1300193788355.4ca0c6cf6654b8f6fd7e3bbba0b9fc6c.
2011-03-22 09:39:11,010 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Processing close of
TEST_UserSettings,,1300103207136.c438541b556672c4f4486416baa371f0.
2011-03-22 09:39:11,009 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver60020.compactor exiting
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Closing
TEST_UserSettings,,1300103207136.c438541b556672c4f4486416baa371f0.:
disabling compactions & flushes
2011-03-22 09:39:11,009 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
regionserver60020.majorCompactionChecker exiting
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
region TEST_UserSettings,,1300103207136.c438541b556672c4f4486416baa371f0.
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed Settings
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed default
2011-03-22 09:39:11,012 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
TEST_UserSettings,,1300103207136.c438541b556672c4f4486416baa371f0.
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Closed region TEST_UserSettings,,1300103207136.c438541b556672c4f4486416baa371f0.
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Closing
TEST_Msg,MZVWNHOSUZYUOYNQKDIAVSCQEPHXVWVXIMGLGXGSSXQTZQMOZCZDCQAUWFSXARWYEBMBRCJMXPHXBIQNDTYTWRURMMOBFISBBSPYEKWWSNGMJCSOPFUGTDBMGUPFOIHOXGWI\x00,1300193788355.4ca0c6cf6654b8f6fd7e3bbba0b9fc6c.:
disabling compactions & flushes
2011-03-22 09:39:11,011 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Closing
-ROOT-,,0.70236052: disabling compactions & flushes
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
region TEST_Msg,MZVWNHOSUZYUOYNQKDIAVSCQEPHXVWVXIMGLGXGSSXQTZQMOZCZDCQAUWFSXARWYEBMBRCJMXPHXBIQNDTYTWRURMMOBFISBBSPYEKWWSNGMJCSOPFUGTDBMGUPFOIHOXGWI\x00,1300193788355.4ca0c6cf6654b8f6fd7e3bbba0b9fc6c.
2011-03-22 09:39:11,012 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
region -ROOT-,,0.70236052
2011-03-22 09:39:11,013 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed Data
2011-03-22 09:39:11,013 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed default
2011-03-22 09:39:11,013 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed info
2011-03-22 09:39:11,013 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
TEST_Msg,MZVWNHOSUZYUOYNQKDIAVSCQEPHXVWVXIMGLGXGSSXQTZQMOZCZDCQAUWFSXARWYEBMBRCJMXPHXBIQNDTYTWRURMMOBFISBBSPYEKWWSNGMJCSOPFUGTDBMGUPFOIHOXGWI\x00,1300193788355.4ca0c6cf6654b8f6fd7e3bbba0b9fc6c.
2011-03-22 09:39:11,013 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
-ROOT-,,0.70236052
2011-03-22 09:39:11,013 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Closed region TEST_Msg,MZVWNHOSUZYUOYNQKDIAVSCQEPHXVWVXIMGLGXGSSXQTZQMOZCZDCQAUWFSXARWYEBMBRCJMXPHXBIQNDTYTWRURMMOBFISBBSPYEKWWSNGMJCSOPFUGTDBMGUPFOIHOXGWI\x00,1300193788355.4ca0c6cf6654b8f6fd7e3bbba0b9fc6c.
2011-03-22 09:39:11,013 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Closed region -ROOT-,,0.70236052
2011-03-22 09:39:11,066 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server
at: hadoop1-s05.farm-ny.gigya.com,60020,1300799918373
2011-03-22 09:39:11,066 DEBUG
org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog
tracker org.apache.hadoop.hbase.catalog.CatalogTracker@2eb0a3f5
2011-03-22 09:39:11,066 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing
leases
2011-03-22 09:39:11,066 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed
leases
2011-03-22 09:39:11,169 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Closed zookeeper sessionid=0x22e669588a20061
2011-03-22 09:39:11,181 INFO org.apache.zookeeper.ZooKeeper: Session:
0x22e669588a20061 closed
2011-03-22 09:39:11,181 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-22 09:39:11,189 INFO org.apache.zookeeper.ZooKeeper: Session:
0x12e669588b80057 closed
2011-03-22 09:39:11,189 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-22 09:39:11,190 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Closing source 1 because: Region server is closing
2011-03-22 09:39:12,414 INFO
org.apache.hadoop.hbase.regionserver.Leases:
regionserver60020.leaseChecker closing leases
2011-03-22 09:39:12,415 INFO
org.apache.hadoop.hbase.regionserver.Leases:
regionserver60020.leaseChecker closed leases
2011-03-22 09:39:19,229 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Source exiting 1
2011-03-22 09:39:19,229 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
exiting
2011-03-22 09:39:19,230 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-15,5,main]
2011-03-22 09:39:19,230 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown
hook
2011-03-22 09:39:19,230 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs
shutdown hook thread.
2011-03-22 09:39:19,231 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
finished.



Forth and probably worst of all, it seems that when the servers are
crashing this way the master still thinks they are alive so the region
is not transitioned and is therefor inaccessible. How long should it
normally take the master to detect a dead region server?

Any help on what's going on would be greatly appreciated.

-eran

Mime
View raw message