hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Larry Compton <lawrence.comp...@gmail.com>
Subject Dying region servers...
Date Fri, 20 Feb 2009 22:49:08 GMT
I'm having problems with my region servers dying. Region server and data
node log snippets are found below. Here's a synopsis of my configuration...
- 4 nodes
- Hadoop/Hbase 0.19.0
- dfs.datanode.max.xcievers - 2048
- dfs.datanode.socket.write.timeout - 0
- file handle limit - 32768
- fsck - healthy

I'm seeing DataXceiver errors in the data node log, but not the sort that
indicates that the max.xcievers value is too small. Any idea what might be
wrong?

HBASE REGION SERVER LOG OUTPUT...
2009-02-20 08:50:42,476 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.net.SocketTimeoutException: 5000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.6.38:56737 remote=/
192.168.6.38:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

2009-02-20 08:50:42,918 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.net.SocketTimeoutException: 5000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.6.38:56646 remote=/
192.168.6.38:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_2604922956617757726_298427 bad datanode[0]
192.168.6.38:50010
2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-3747640666687562371_298377 bad datanode[0]
192.168.6.38:50010
2009-02-20 08:50:44,356 FATAL org.apache.hadoop.hbase.regionserver.HLog:
Could not append. Requesting close of log
java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
2009-02-20 08:50:44,357 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
failed for region
medline,_X2dX5031454eX3aX11f48751c5eX3aXX2dX725c,1235136902878
java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
2009-02-20 08:50:44,377 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: All
datanodes 192.168.6.38:50010 are bad. Aborting...
2009-02-20 08:50:44,377 FATAL
org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with ioe:
java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
2009-02-20 08:50:44,378 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting  compaction on region medline,"blood",1235125955035
2009-02-20 08:50:44,380 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020, call batchUpdates([B@ecb0da,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@14ed87c) from 192.168.6.29:47457:
error: java.io.IOException: All datanodes 192.168.6.38:50010 are bad.
Aborting...
java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
2009-02-20 08:50:44,418 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=2581, regions=71, stores=212, storefiles=352, storefileIndexSize=31,
memcacheSize=574, usedHeap=1190, maxHeap=1984

DATANODE LOG OUTPUT...
2009-02-20 08:50:45,337 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-3747640666687562371_298377 0 Exception java.net.SocketException: Broken
pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:797)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820)
        at java.lang.Thread.run(Thread.java:619)

2009-02-20 08:50:45,337 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-3747640666687562371_298377 terminating
2009-02-20 08:50:45,337 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-3747640666687562371_298377 received exception java.io.EOFException:
while trying to read 32873 bytes
2009-02-20 08:50:45,337 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_2604922956617757726_298427 0 Exception java.net.SocketException: Broken
pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:115)
        at java.io.DataOutputStream.writeShort(DataOutputStream.java:150)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:798)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820)
        at java.lang.Thread.run(Thread.java:619)

2009-02-20 08:50:45,338 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_2604922956617757726_298427 terminating
2009-02-20 08:50:45,338 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_2604922956617757726_298427 received exception java.io.EOFException:
while trying to read 49299 bytes
2009-02-20 08:50:45,342 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
192.168.6.38:50010, dest: /192.168.6.38:56791, bytes: 3318, op: HDFS_READ,
cliID: DFSClient_1697856093, srvID:
DS-697440498-192.168.6.38-50010-1233008986086, blockid:
blk_-4029959142608094898_296648
2009-02-20 08:50:46,680 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.6.38:50010, storageID=DS-697440498-192.168.6.38-50010-1233008986086,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 32873 bytes
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
        at java.lang.Thread.run(Thread.java:619)
2009-02-20 08:50:46,680 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.6.38:50010, storageID=DS-697440498-192.168.6.38-50010-1233008986086,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 49299 bytes
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
        at java.lang.Thread.run(Thread.java:619)

Larry

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message