hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1439) race between master and regionserver after missed heartbeat
Date Tue, 14 Jul 2009 19:17:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731061#action_12731061
] 

Jean-Daniel Cryans commented on HBASE-1439:
-------------------------------------------

This log from a recent trunk makes me think we should still do that special watcher (the master
split logs around 14:11:58):

{code}
2009-07-13 14:10:44,219 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed
on region t3,,1247502416090 in 0sec
2009-07-13 14:10:46,735 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
t3,******************,1247507770509: Overloaded
2009-07-13 14:10:46,743 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE:
t3******************,1247507770509: Overloaded
2009-07-13 14:10:46,743 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing t3,******************,1247507770509:
compactions & flushes disabled 
2009-07-13 14:10:46,743 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled
for region, no outstanding scanners on t3,**************,1247507770509
2009-07-13 14:10:46,744 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks
outstanding on region t3,U**************9,1247507770509
2009-07-13 14:10:46,744 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed block
2009-07-13 14:10:46,744 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed t3,,******************,1247507770509
2009-07-13 14:11:54,283 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats:
Sizes: Total=336.81897MB (353180272), Free=59.856064MB (62763632), Max=396.67502MB (415943904),
Counts: Blocks=5251, Access=259404, Hit=5232, Miss=2
54172, Evictions=13, Evicted=8223, Ratios: Hit Ratio=2.016931027173996%, Miss Ratio=97.98306822776794%,
Evicted/Run=632.5384521484375
2009-07-13 14:11:54,375 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12274aa40060001
to sun.nio.ch.SelectionKeyImpl@2b20bf2c
java.io.IOException: TIMED OUT
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:11:54,375 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x22274aa40700002
to sun.nio.ch.SelectionKeyImpl@12746ad0
java.io.IOException: TIMED OUT
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:11:58,996 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Got ZooKeeper event, state: Disconnected, type: None, path: null
2009-07-13 14:11:59,002 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Processing
message (Retry: 0)
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
...
2009-07-13 14:11:59,097 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Exceeded
max retries: 10
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:561)
        at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:11:59,262 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
**************/**************:2181
2009-07-13 14:11:59,262 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/**************53663 remote=**************/**************:2181]
2009-07-13 14:11:59,262 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-07-13 14:11:59,635 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
**************/**************:2181
2009-07-13 14:11:59,635 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/**************:44760 remote=**************/**************:2181]
2009-07-13 14:11:59,635 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-07-13 14:12:00,103 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
event, state: Expired, type: None, path: null
2009-07-13 14:12:00,104 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12274aa40060001
to sun.nio.ch.SelectionKeyImpl@5e297e83
java.io.IOException: Session Expired
        at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548)
        at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
2009-07-13 14:12:00,104 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper
session expired
2009-07-13 14:12:00,104 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Restarting
Region Server
2009-07-13 14:12:00,105 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Exceeded
max retries: 10
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:561)
        at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:12:00,106 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=4, stores=4, storefiles=7, storefileIndexSize=4, memstoreSize=96, usedHeap=1153,
maxHeap=1983, blockCacheSize=353180272,
 blockCacheFree=62763632, blockCacheCount=5251, blockCacheHitRatio=2
2009-07-13 14:12:00,110 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler
9 on 60020 took 1033ms appending an edit to hlog; editcount=60781
2009-07-13 14:12:04,256 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler
9 on 60020 took 4128ms appending an edit to hlog; editcount=62375
2009-07-13 14:12:04,257 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stop was
requested, clearing the toDo despite of the exception
2009-07-13 14:12:04,257 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x22274aa40700002
to sun.nio.ch.SelectionKeyImpl@54f015d5
java.io.IOException: TIMED OUT
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:12:04,257 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
 2009-07-13 14:12:04,257 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020
2009-07-13 14:12:04,257 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020:
exiting
2009-07-13 14:12:04,257 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020:
exiting
2009-07-13 14:12:04,258 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener
on 60020
2009-07-13 14:12:04,282 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
infoServer
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver/**************:60020.cacheFlusher
exiting
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.Leases: regionserver/**************:60020.leaseChecker
closing leases
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread
exiting
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.Leases: regionserver/**************:60020.leaseChecker
closed leases
2009-07-13 14:12:04,318 INFO org.apache.hadoop.hbase.regionserver.LogFlusher: regionserver/**************:60020.logFlusher
exiting
2009-07-13 14:12:04,484 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2009-07-13 14:12:04,494 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020:
exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020:
exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020:
exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020:
exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020:
exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020:
exiting
2009-07-13 14:12:04,513 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020:
exiting
2009-07-13 14:12:05,076 DEBUG org.apache.hadoop.hbase.regionserver.HLog: closing hlog writer
in hdfs://**************:50000/hbase/.logs/**************,60020,1247497897306
2009-07-13 14:12:05,076 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException:
Cannot append; log is closed
2009-07-13 14:12:05,076 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver/**************:60020.compactor
exiting
2009-07-13 14:12:05,076 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
regionserver/**************:60020.majorCompactionChecker exiting
2009-07-13 14:12:05,138 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to
close log in abort
java.io.IOException: java.io.IOException: Could not complete write to file /hbase/.logs/**************60020,1247497897306/hlog.dat.1247508029747
by DFSClient_51605845
        at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:626)
        at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:12:05,138 INFO org.apache.hadoop.fs.FileSystem: Could not cancel cleanup thread,
though no FileSystems are open
2009-07-13 14:12:05,138 ERROR org.apache.hadoop.hbase.util.FSUtils: file system close failed:

java.lang.NullPointerException
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3127)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3170)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3115)
        at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1002)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:208)
        at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269)
        at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:119)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:861)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:809)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:792)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1793)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
2009-07-13 14:12:05,138 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing
region t3,**************
2009-07-13 14:12:05,138 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting
down HRegionServer: file system not available
java.io.IOException: File system is not available
        at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:123)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:861)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:809)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:792)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1793)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
Caused by: java.io.IOException: Call to **************1/**************:50000 failed on local
exception: java.nio.channels.ClosedChannelException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
        at org.apache.hadoop.ipc.Client.call(Client.java:742)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy1.getFileInfo(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:587)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
        at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:112)
        ... 9 more
Caused by: java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:113)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:156)
        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
2009-07-13 14:12:05,138 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing t3,**************:
compactions & flushes disabled 
2009-07-13 14:12:05,143 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled
for region, no outstanding scanners on t3,**************
2009-07-13 14:12:05,143 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=4, stores=4, storefiles=7, storefileIndexSize=4, memstoreSize=96, usedHeap=1156,
maxHeap=1983, blockCacheSize=353180272,
 blockCacheFree=62763632, blockCacheCount=5251, blockCacheHitRatio=2
2009-07-13 14:12:05,143 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks
outstanding on region t3,**************
2009-07-13 14:12:05,144 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020,
call put([B@1df68388, [Lorg.apache.hadoop.hbase.client.Put;@23356516) from **************:48126:
error: java.io.IOException: Cannot append; log is
 closed
java.io.IOException: Cannot append; log is closed
        at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584)
        at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1446)
        at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1246)
        at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1209)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1785)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
2009-07-13 14:12:05,151 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed block
2009-07-13 14:12:05,152 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing
region t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing t3,**************:
compactions & flushes disabled 
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled
for region, no outstanding scanners on t3**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks
outstanding on region t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed block
2009-07-13 14:12:05,152 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing
region t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing t3,,1247502416090:
compactions & flushes disabled 
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled
for region, no outstanding scanners on t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks
outstanding on region t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed block
2009-07-13 14:12:05,152 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing
region t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing t3,**************:
compactions & flushes disabled 
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled
for region, no outstanding scanners on t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks
outstanding on region t3,**************
-07-13 14:12:05,153 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call put([B@1df68388,
[Lorg.apache.hadoop.hbase.client.Put;@23356516) from **************:48126: output error
2009-07-13 14:12:05,156 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020
caught: java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1122)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:613)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:677)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:941)

2009-07-13 14:12:05,156 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020:
exiting
2009-07-13 14:12:05,156 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed block
2009-07-13 14:12:05,156 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed t3,**************
2009-07-13 14:12:05,156 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting
server at: **************:60020
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x12274aa40060001
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session:
0x12274aa40060001
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for
session: 0x12274aa40060001
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12274aa40060001 closed
2009-07-13 14:12:05,157 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection
with ZooKeeper
2009-07-13 14:12:05,157 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/**************:60020
exiting
2009-07-13 14:12:05,172 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics: Initializing RPC
Metrics with hostName=HRegionServer, port=60020
2009-07-13 14:12:05,181 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: globalMemStoreLimit=793.4m,
globalMemStoreLimitLowMark=495.8m, maxHeap=1.9g
2009-07-13 14:12:05,181 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting
shutdown thread.
2009-07-13 14:12:05,195 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown
thread complete
2009-07-13 14:12:05,195 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every
10000000ms
2009-07-13 14:12:05,219 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
**************/**************:2181
2009-07-13 14:12:05,219 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/**************:33151 remote=**************/**************:2181]
2009-07-13 14:12:05,219 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-07-13 14:12:05,380 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
host=*****************:2181 sessionTimeout=30000 watcher=org.apache.hadoop.hbase.regionserver.HRegionServer@47315d34
2009-07-13 14:12:10,636 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2009-07-13 14:12:10,636 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
**************/**************:2181
2009-07-13 14:12:10,636 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init
java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:710)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:425)
        at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:12:10,636 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled
exception. Aborting...
java.io.IOException: Region server startup failed
        at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:829)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:748)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:425)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:710)
      ... 2 more
2009-07-13 14:12:10,636 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=4, stores=4, storefiles=7, storefileIndexSize=4, memstoreSize=96, usedHeap=1173,
maxHeap=1983, blockCacheSize=353180272,
 blockCacheFree=62763632, blockCacheCount=5251, blockCacheHitRatio=2
2009-07-13 14:12:10,636 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
2009-07-13 14:12:11,084 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/**************:52015 remote=**************/**************:2181]
2009-07-13 14:12:11,084 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-07-13 14:12:11,086 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020
2009-07-13 14:12:11,086 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x22274aa40700002
to sun.nio.ch.SelectionKeyImpl@411cac1d
java.io.IOException: TIMED OUT
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:12:11,090 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
infoServer
2009-07-13 14:12:11,135 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting
server at: **************:60020
2009-07-13 14:12:11,135 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x0
2009-07-13 14:12:11,135 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session:
0x0
2009-07-13 14:12:11,365 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
event, state: SyncConnected, type: None, path: null
2009-07-13 14:12:11,365 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Ignoring
ZooKeeper event while shutting down
2009-07-13 14:12:11,417 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
**************/**************:2181
2009-07-13 14:12:11,421 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/**************52016 remote=**************/**************:2181]
2009-07-13 14:12:11,421 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-07-13 14:12:11,422 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x22274aa40700002
to sun.nio.ch.SelectionKeyImpl@44a45d2c
java.io.IOException: Session Expired
        at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548)
        at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
2009-07-13 14:12:11,422 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Got ZooKeeper event, state: Expired, type: None, path: null
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x22274aa40700002
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session:
0x22274aa40700002
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for
session: 0x22274aa40700002
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ZooKeeper: Session: 0x22274aa40700002 closed
2009-07-13 14:12:11,422 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection
with ZooKeeper
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2009-07-13 14:12:11,997 INFO org.apache.zookeeper.ClientCnxn: Exception while closing send
thread for session 0x12274aa40060005 : Read error rc = -1 java.nio.DirectByteBuffer[pos=0
lim=4 cap=4]
2009-07-13 14:12:12,101 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for
session: 0x12274aa40060005
2009-07-13 14:12:12,101 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12274aa40060005 closed
2009-07-13 14:12:12,101 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2009-07-13 14:12:12,101 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection
with ZooKeeper
2009-07-13 14:12:12,149 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/**************:60020
exiting
{code}

> race between master and regionserver after missed heartbeat
> -----------------------------------------------------------
>
>                 Key: HBASE-1439
>                 URL: https://issues.apache.org/jira/browse/HBASE-1439
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.1
>         Environment: CentOS 5.2 x86_64, HBase 0.19.1, Hadoop 0.19.1
>            Reporter: Andrew Purtell
>            Priority: Blocker
>             Fix For: 0.20.1
>
>
> Seen on one of our 0.19.1 clusters:
> {code}
> java.io.FileNotFoundException: File does not exist: hdfs://jdc2-atr-dc-2.atr.trendmicro.com:50000
> /data/hbase/log_10.3.134.207_1242286427894_60020/hlog.dat.1242528291898
>  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>  at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
>  at org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
>  at org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
>  at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:753)
>  at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:716)
>  at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
>  at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:442)
>  at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:377)
> 2009-05-17 04:05:55,481 INFO org.apache.hadoop.hbase.master.RegionServerOperation: process
> shutdown of server 10.3.134.207:60020: logSplit: false, rootRescanned: false, numberOfMetaRegions:
1,
> onlineMetaRegions.size(): 1
> {code}
> I do not have the region server log yet, but here is my conjecture:
> Here, the write ahead log for 10.3.134.207 is missing in DFS: java.io.FileNotFoundException:
hdfs://jdc2-atr-dc-2.atr.trendmicro.com:50000/data/hbase/log_10.3.134.207_1242286427894_60020/hlog.dat.1242528291898
when the master tries to split it after declaring the region server crashed. There have been
recent trouble reports on this cluster that indicate severe memory stress, e.g. kernel panics
due to OOM. Based on that I think it is likely that the region server here missed a heartbeat
so the master declared it crashed and began to split the log. But, the log was then deleted
out from underneath the master's split thread. I think the region server was actually still
running but partially swapped out or the node was otherwise overloaded so it missed its heartbeat.
Then, when the region server came back after being swapped in, it realized it missed its heartbeat
and shut down, deleting the log as is normally done. 
> Even if the above is not the actual cause in this case, I think the scenario is plausible.
What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message