hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Dying region servers...
Date Sat, 21 Feb 2009 05:45:03 GMT
On Fri, Feb 20, 2009 at 2:49 PM, Larry Compton
<lawrence.compton@gmail.com>wrote:

> I'm having problems with my region servers dying. Region server and data
> node log snippets are found below. Here's a synopsis of my configuration...
> - 4 nodes
> - Hadoop/Hbase 0.19.0
> - dfs.datanode.max.xcievers - 2048
> - dfs.datanode.socket.write.timeout - 0
> - file handle limit - 32768
> - fsck - healthy


Thanks for reporting that you have above configured.  What size table,
regions and rows?

Is the dfs.datanode.socket.write.timeout=0 set in a context that hbase can
see it?  i.e. is it in hbase-site or is it in hadoop-site and symlinked
under the hbase/conf dir so hbase picks it up?  Going by errors below, its
absence could be explaination.

Yours,
St.Ack


>
> I'm seeing DataXceiver errors in the data node log, but not the sort that
> indicates that the max.xcievers value is too small. Any idea what might be
> wrong?
>
> HBASE REGION SERVER LOG OUTPUT...
> 2009-02-20 08:50:42,476 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.net.SocketTimeoutException: 5000 millis timeout while
> waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.6.38:56737remote=/
> 192.168.6.38:50010]
>        at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)
>
> 2009-02-20 08:50:42,918 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.net.SocketTimeoutException: 5000 millis timeout while
> waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.6.38:56646remote=/
> 192.168.6.38:50010]
>        at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)
>
> 2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_2604922956617757726_298427 bad datanode[0]
> 192.168.6.38:50010
> 2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-3747640666687562371_298377 bad datanode[0]
> 192.168.6.38:50010
> 2009-02-20 08:50:44,356 FATAL org.apache.hadoop.hbase.regionserver.HLog:
> Could not append. Requesting close of log
> java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
> 2009-02-20 08:50:44,357 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> failed for region
> medline,_X2dX5031454eX3aX11f48751c5eX3aXX2dX725c,1235136902878
> java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
> 2009-02-20 08:50:44,377 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException:
> All
> datanodes 192.168.6.38:50010 are bad. Aborting...
> 2009-02-20 08:50:44,377 FATAL
> org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with
> ioe:
> java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
> 2009-02-20 08:50:44,378 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region medline,"blood",1235125955035
> 2009-02-20 08:50:44,380 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 6 on 60020, call batchUpdates([B@ecb0da,
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@14ed87c) from 192.168.6.29:47457
> :
> error: java.io.IOException: All datanodes 192.168.6.38:50010 are bad.
> Aborting...
> java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting...
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
> 2009-02-20 08:50:44,418 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=2581, regions=71, stores=212, storefiles=352,
> storefileIndexSize=31,
> memcacheSize=574, usedHeap=1190, maxHeap=1984
>
> DATANODE LOG OUTPUT...
> 2009-02-20 08:50:45,337 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-3747640666687562371_298377 0 Exception java.net.SocketException:
> Broken
> pipe
>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>        at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>        at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:797)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2009-02-20 08:50:45,337 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block
> blk_-3747640666687562371_298377 terminating
> 2009-02-20 08:50:45,337 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-3747640666687562371_298377 received exception java.io.EOFException:
> while trying to read 32873 bytes
> 2009-02-20 08:50:45,337 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_2604922956617757726_298427 0 Exception java.net.SocketException: Broken
> pipe
>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>        at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>        at java.net.SocketOutputStream.write(SocketOutputStream.java:115)
>        at java.io.DataOutputStream.writeShort(DataOutputStream.java:150)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:798)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2009-02-20 08:50:45,338 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block
> blk_2604922956617757726_298427 terminating
> 2009-02-20 08:50:45,338 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_2604922956617757726_298427 received exception java.io.EOFException:
> while trying to read 49299 bytes
> 2009-02-20 08:50:45,342 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 192.168.6.38:50010, dest: /192.168.6.38:56791, bytes: 3318, op: HDFS_READ,
> cliID: DFSClient_1697856093, srvID:
> DS-697440498-192.168.6.38-50010-1233008986086, blockid:
> blk_-4029959142608094898_296648
> 2009-02-20 08:50:46,680 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.6.38:50010,
> storageID=DS-697440498-192.168.6.38-50010-1233008986086,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 32873 bytes
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
>        at java.lang.Thread.run(Thread.java:619)
> 2009-02-20 08:50:46,680 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.6.38:50010,
> storageID=DS-697440498-192.168.6.38-50010-1233008986086,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 49299 bytes
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
>        at java.lang.Thread.run(Thread.java:619)
>
> Larry
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message