hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Riedel <lancerie...@gmail.com>
Subject Crashing Region Servers
Date Thu, 09 Dec 2010 19:27:42 GMT

We have a 6 node cluster, 5 with region serves.  2 of the region servers have been stable
for days, but 3 of them keep crashing.  Here are the logs around around when the crash occurs.
(btw, we are shoving approximately the twitter firehose into hbase via flume) I'm an hbase
newbie, but I have been reading. Not sure what else is needed to help debug. When the problem
occurs, it is concurrent with issues that appear all the way down in the data node.  Also,
interesting to note, all 3 servers seem to fail differently somewhat:

Servers (6):
00.hadoop is the master
01.hadoop, 03.hadoop (there is no 02.hadoop - flaky machine that got taken out) - are stable
Region Servers
04-06.hadoop crash at different times, very different logs, Region Servers


Master Attributes:
HBase Version 0.89.20100924+28, r
Hadoop Version 0.20.2+737, r98c55c28258aa6f42250569bd7fa431ac657bdbd

HBase Root Directory hdfs://00.hadoop.****:54310/hbase

Load average
103.0
Average number of regions per regionserver. Naive computation.
Regions On FS
577


(This is after recently restarting 04-06)


Address   Start Code    Load
01.hadoop:60030  1291268910638   requests=223, regions=118, usedHeap=519, maxHeap=987
03.hadoop:60030  1291269219610   requests=23, regions=111, usedHeap=862, maxHeap=987
04.hadoop:60030  1291910665912   requests=169, regions=82, usedHeap=194, maxHeap=888
05.hadoop:60030  1291909584060   requests=232, regions=110, usedHeap=477, maxHeap=888
06.hadoop60030   1291909723787   requests=99, regions=94, usedHeap=394, maxHeap=888
Total:
servers: 5
requests=746, regions=515



Following are various logs around the time of the failures for 04-06

************************************************************************************
04.hadoop  
************************************************************************************
REGION SERVER ->
2010-12-09 05:01:58,640 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed
on region article,a83858a08f2270d319f75a7b43c756c2453988e7,1291811929868.58f6d9fc80
c78f3ca490b0280b4f1226. in 0sec
2010-12-09 05:04:30,898 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=1.46
MB, free=176.33 MB, max=177.79 MB, blocks=0, accesses=200583, hits=0, hitRatio=
0.00%%, evictions=0, evicted=0, evictedPerRun=NaN
2010-12-09 05:04:59,612 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
Using syncFs -- HDFS-200
2010-12-09 05:04:59,618 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/04.hadoop.****,60020,1291858770125/10.100.154.103%3A60020.1291870585253,
entries=5318
6, filesize=63776311. New hlog /hbase/.logs/04.hadoop.****,60020,1291858770125/10.100.154.103%3A60020.1291871099582
2010-12-09 05:08:02,033 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs
to remove  out of total 6; oldest outstanding sequenceid is 32891801 from region article
,4b2039b791e894dd479b90661ca97087f61645d3,1291813201039.7dff47c63bb5648bb1ee6670c60553a5.
2010-12-09 05:08:02,034 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler
13 on 60020 took 182416ms appending an edit to hlog; editcount=0
2010-12-09 05:08:02,066 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog
file /hbase/.logs/04.hadoop.****,60020,1291858770125/10.100.154.103%3A60020.12918679476
65 whose highest sequenceid is 32880227 to /hbase/.oldlogs/10.100.154.103%3A60020.1291867947665
2010-12-09 05:08:02,066 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have
not heard from server in 182936ms for sessionid 0x12ca58c08ff0048, closing socket conn
ection and attempting reconnect
2010-12-09 05:08:02,089 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Aborting
region server serverName=04.hadoop.****,60020,1291858770125, load=(requests=24, regions=
64, usedHeap=548, maxHeap=888): Unhandled exception
org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing 04.hadoop.****,60020,1291858770125 as dead s
erver
        at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:217)
        at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:271)
        at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:744)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:561)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.doRun(HRegionServer.java:635)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.access$000(HRegionServer.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegionServer$1.run(HRegionServer.java:518)
        at org.apache.hadoop.hbase.regionserver.HRegionServer$1.run(HRegionServer.java:516)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:337)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1046)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:516)
        at java.lang.Thread.run(Thread.java:662)
2010-12-09 05:08:02,090 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=64, stores=64, storefiles=136, storefileIndexSize=27, memsto
reSize=353, compactionQueueSize=0, usedHeap=549, maxHeap=888, blockCacheSize=1530552, blockCacheFree=184893160,
blockCacheCount=0, blockCacheHitRatio=0
2010-12-09 05:08:02,090 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020
2010-12-09 05:08:02,090 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 0 on
60020: exiting2010-12-09 05:08:02,090 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 60020


MASTER HBASE ->

2010-12-09 05:05:06,072 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
scanned
2010-12-09 05:05:40,004 INFO org.apache.hadoop.hbase.master.ServerManager: 04.hadoop.jive,60020,1291858770125
znode expired
2010-12-09 05:05:40,005 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=04.hadoop.jive,60020,1291858770125
to dead servers, added shutdown processing operation
2010-12-09 05:05:40,005 DEBUG org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
todo: ProcessServerShutdown of 04.hadoop.jive,60020,12918587701252010-12-09 05:05:40,005 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown of server 04.hadoop.jive,60020,1291858770125:
logSplit: false, rootRescanned: f
alse, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2010-12-09 05:05:40,008 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 7 hlog(s)
in hdfs://00.hadoop.jive:54310/hbase/.logs/04.hadoop.jive,60020,12918587701252010-12-09 05:05:40,008
DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 7: hdfs://00.hadoop.jive:54310/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.1
00.154.103%3A60020.1291867947665, length=637577092010-12-09 05:05:40,008 INFO org.apache.hadoop.hbase.util.FSUtils:
Recovering filehdfs://00.hadoop.jive:54310/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020
.1291867947665


NAMENODE ->

2010-12-09 05:08:02,471 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
blockMap updated: 10.100.159.13:50010 is added to blk_1531008743226086399_251615 size 63757709
2010-12-09 05:08:02,473 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310,
call complete(/hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020.1291867947665,
DFSClient_hb_m_10.194.194.79:60000_1291788452343) from 10.194.194.79:44117: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease on /hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020.1291867947665
File does not exist. [Lease.  Holder: DFSClient_hb_m_10.194.194.79:60000_1291788452343, pendingcreates:
1]
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase/.logs/04.hadoop.jive,60020,1291858770125/10.100.154.103%3A60020.1291867947665
File does not exist. [Lease.  Holder: DFSClient_hb_m_10.194.194.79:60000_1291788452343, pendingcreates:
1]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1488)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1479)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1534)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1522)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:610)
        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1313)
2010-12-09 05:08:04,206 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
blockMap updated: 10.99.97.106:50010 is added to blk_-734280257049179934_251614 size 7330

DATANODE ->

2010-12-09 05:08:02,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException
in BlockReceiver.lastNodeRun: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
       at sun.nio.ch.IOUtil.write(IOUtil.java:71)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
       at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)   
    at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)   
    at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:133)
       at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:840)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
        at java.lang.Thread.run(Thread.java:662)
2010-12-09 05:08:02,213 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError:
exception: 
2010-12-09 05:08:02,213 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError:
exception: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
        at sun.nio.ch.IOUtil.write(IOUtil.java:71)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:133)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:840)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
        at java.lang.Thread.run(Thread.java:662)
2010-12-09 05:08:02,221 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8817504198034990390_251613 0 Exception java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
        at sun.nio.ch.IOUtil.write(IOUtil.java:71)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:133)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:840)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
        at java.lang.Thread.run(Thread.java:662)

2010-12-09 05:08:02,221 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
0 for block blk_-8817504198034990390_251613 terminating
2010-12-09 05:08:02,224 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8817504198034990390_251613
received exception java.io.IOException: Connection reset by peer
2010-12-09 05:08:02,224 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block
blk_4505306716668305567_250752 file /dist1/data/hadoop-data/current/subdir19/blk_4505306716668305567
for deletion
2010-12-09 05:08:02,226 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.100.154.103:50010,
storageID=DS-866555524-10.100.154.103-50010-1291262739506, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
        at sun.nio.ch.IOUtil.read(IOUtil.java:210)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:267)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
2010-12-09 05:08:02,226 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block
blk_1422641301942366074_250732 at file /dist1/data/hadoop-data/current/subdir19/blk_1422641301942366074
2010-12-09 05:08:02,227 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block
blk_4505306716668305567_250752 at file /dist1/data/hadoop-data/current/subdir19/blk_4505306716668305567



************************************************************************************
05.hadoop
************************************************************************************
REGION SERVER ->
2010-12-09 04:29:25,047 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size
of data: 250.5m; Skipped 1 file(s), size: 206896367
2010-12-09 04:29:25,047 INFO org.apache.hadoop.hbase.regionserver.Store: Started compaction
of 2 file(s) in data of article,060c4b75067378c045eead9edec2d4fb7d8f3e5e,1291812563166.95e92e97823e11e57850720a63433a62.
 into hdfs://00.hadoop.****:54310/hbase/article/95e92e97823e11e57850720a63433a62/.tmp, sequenceid=33050221
2010-12-09 04:29:26,328 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction
of 2 file(s) in data of article,060c4b75067378c045eead9edec2d4fb7d8f3e5e,1291812563166.95e92e97823e11e57850720a63433a62.;
new storefile is hdfs://00.hadoop.****:54310/hbase/article/95e92e97823e11e57850720a63433a62/data/2787768957794924068;
store size is 250.5m
2010-12-09 04:29:26,328 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed
on region article,060c4b75067378c045eead9edec2d4fb7d8f3e5e,1291812563166.95e92e97823e11e57850720a63433a62.
in 1sec
2010-12-09 04:29:26,328 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction
on region article,ed0cdc7ae6db73432e2ca47c973d545bee11ddd8,1291813400028.85a8d31d099f409dd36f3bcc4be1eba6.
2010-12-09 04:29:26,330 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed
on region article,ed0cdc7ae6db73432e2ca47c973d545bee11ddd8,1291813400028.85a8d31d099f409dd36f3bcc4be1eba6.
in 0sec
2010-12-09 04:32:33,081 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
2010-12-09 04:32:33,081 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020
2010-12-09 04:32:33,089 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_2012253533858803341_250248java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2621)

2010-12-09 04:32:33,089 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x12ca58c08ff0049, likely server has closed socket, closing sock
et connection and attempting reconnect
2010-12-09 04:32:33,089 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler
5 on 60020 took 75446ms appending an edit to hlog; editcount=50505
2010-12-09 04:32:33,090 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 18 on 60020:
exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_2012253533858803341_250248
waiting for responder to exit. 
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020:
exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 1 on
60020: exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 10 on 60020:
exiting
2010-12-09 04:32:33,140 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 0 on
60020: exiting2010-12-09 04:32:33,140 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 5 on 60020: exiting
2010-12-09 04:32:33,139 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 8 on
60020: exiting2010-12-09 04:32:33,140 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 4 on 60020: exiting
2010-12-09 04:32:33,140 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 3 on
60020: exiting2010-12-09 04:32:33,142 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 7 on 60020: exiting
2010-12-09 04:32:33,142 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 9 on
60020: exiting2010-12-09 04:32:33,143 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Stopping infoServer
2010-12-09 04:32:33,143 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 2 on
60020: exiting
2010-12-09 04:32:33,143 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2010-12-09 04:32:33,143 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 6 on
60020: exiting
2010-12-09 04:32:33,144 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60030




NAMENODE ->
2010-12-09 04:32:08,408 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile:
failed to create file /hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79, because this
file is already being created by NN_Recovery on 10.100.159.13
2010-12-09 04:32:08,408 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310,
call append(/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555,
DFSClient_hb_m_10.194.194.79:60000_1291788452343) from 10.194.194.79:59118: error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file /hba
se/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555 for DFSClient_hb_m_10.194.194.79:60000_1291788452343
on client 10.194.194.79, because this file is already being created by NN_Recovery on 10.100.159.13
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79, because this
file is already being created by NN_Recovery on 10.100.159.13
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1194)
       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1282)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:541)     
  at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)        at java.security.AccessController.doPrivileged(Native
Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1313)2010-12-09 04:32:09,411
WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: failed to create file
/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A6
0020.1291868481555 for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79,
because this file is already being created by NN_Recovery on 10.100.159.132010-12-09 04:32:09,411
INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310, call append(/hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868
481555, DFSClient_hb_m_10.194.194.79:60000_1291788452343) from 10.194.194.79:59118: error:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClient_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79, because this
file is 
already being created by NN_Recovery on 10.100.159.13org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file /hbase/.logs/05.hadoop.jive,60020,1291862294517/10.100.159.13%3A60020.1291868481555
for DFSClie
nt_hb_m_10.194.194.79:60000_1291788452343 on client 10.194.194.79, because this file is already
being created by NN_Recovery on 10.100.159.13        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1194)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1282)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:541)
        at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)



DATANODE 

2010-12-09 04:32:09,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_2012253533858803341_250248 1 : Thread is interrupted.
2010-12-09 04:32:09,534 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.100.159.13:50010,
storageID=DS-145997007-10.100.159.13-50010-1291262743182, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Interrupted receiveBlock
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:579)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
2010-12-09 04:32:09,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 for block blk_2012253533858803341_250248 terminating


************************************************************************************
06.hadoop
************************************************************************************
REGION SERVER ->

2010-12-09 06:17:47,215 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,430 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,549 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,557 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:47,563 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:48,619 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,703 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,703 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,703 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,705 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.
2010-12-09 06:17:49,729 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
article,8d36892f14dfa6d17d08900743960d0b0973fc96,1291812431639.cce2b650859704eb184df8fb5c5eee6f.

A bunch of these ^

Then a little later::
.... 

2010-12-09 06:23:23,561 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for region article,993143fca3613014990cc6975cacdd62f4147e03,12918133247
53.19413f33fc660536d9d4f315a1283440. because: regionserver60020.cacheFlusher
2010-12-09 06:23:27,058 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
Using syncFs -- HDFS-200
2010-12-09 06:23:27,080 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/06.hadoop.****,60020,1291864620851/10.99.97.106%3A60020.1291875047947,
entries=53315,
 filesize=63781925. New hlog /hbase/.logs/06.hadoop.****,60020,1291864620851/10.99.97.106%3A60020.1291875807006
2010-12-09 06:23:27,080 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs
to remove  out of total 8; oldest outstanding sequenceid is 33526881 from region article
,1221b0013c61fd7446cb822f698d0bcafa487342,1291872610788.bb2aa6ae8c694a096a00e7bb91fc46c3.
2010-12-09 06:23:27,080 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog
file /hbase/.logs/06.hadoop.****,60020,1291864620851/10.99.97.106%3A60020.1291872064424
 whose highest sequenceid is 33506639 to /hbase/.oldlogs/10.99.97.106%3A60020.1291872064424
2010-12-09 06:25:39,162 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler
12 on 60020 took 119034ms appending an edit to hlog; editcount=3683
2010-12-09 06:25:39,162 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException:
Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
        at sun.nio.ch.IOUtil.write(IOUtil.java:71)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2509)
2010-12-09 06:25:39,163 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_1141147913164105086_258331java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2621)

2010-12-09 06:25:39,163 INFO org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1141147913164105086_258331
waiting for responder to exit. 
2010-12-09 06:25:39,164 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_-7076574401277381592_258324java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2621)

2010-12-09 06:25:39,164 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-7076574401277381592_258324
bad datanode[0] 10.99.97.106:50010
2010-12-09 06:25:39,164 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x12ca58c08ff004b, likely server has closed socket, closing socket connection
and attempting reconnect
2010-12-09 06:25:39,164 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-7076574401277381592_258324
in pipeline 10.99.97.106:50010, 10.100.154.103:50010: bad datanode 10.99.97.106:50010
2010-12-09 06:25:39,165 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1141147913164105086_258331
bad datanode[0] 10.99.97.106:50010
2010-12-09 06:25:39,167 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x12ca58c08ff004c, likely server has closed socket, closing socket connection
and attempting reconnect
2010-12-09 06:25:39,169 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.100.154.103:50020.
Already tried 0 time(s).
2010-12-09 06:25:39,169 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1141147913164105086_258331
in pipeline 10.99.97.106:50010, 10.100.159.13:50010: bad datanode 10.99.97.106:50010
2010-12-09 06:25:39,209 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020:
exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020:
exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 24 on 60020:
exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 1 on
60020: exiting
2010-12-09 06:25:39,210 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 15 on 60020:
exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 0 on
60020: exiting
2010-12-09 06:25:39,214 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 3 on
60020: exiting
2010-12-09 06:25:39,214 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener
on 60020
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 2 on
60020: exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 7 on
60020: exiting
2010-12-09 06:25:39,213 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 21 on 60020:
exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 4 on
60020: exiting
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2010-12-09 06:25:39,215 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 5 on
60020: exiting
2010-12-09 06:25:39,216 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 8 on
60020: exiting
2010-12-09 06:25:39,216 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 9 on
60020: exiting
2010-12-09 06:25:39,216 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 6 on
60020: exiting
2010-12-09 06:25:39,216 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
infoServer
2010-12-09 06:25:39,223 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #0
from primary datanode 10.100.159.13:50010




Mime
View raw message