hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Fondement <frederic.fondem...@uha.fr>
Subject datanode timeout
Date Mon, 25 Jun 2012 09:00:43 GMT
Hi all !

I'm getting trouble with my HBase as the following error appears more 
and more often (each 2 to 15 mins on each node):

2012-06-25 10:25:30,646 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.120.0.5:50010, 
storageID=DS-1339564791-127.0.0.1-50010-1296151113818, infoPort=50075, 
ipcPort=50020):Got exception while serving 
blk_4839251368515801234_555101 to /10.120.0.5:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/10.120.0.5:50010 
remote=/10.120.0.5:42564]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

2012-06-25 10:25:30,646 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.120.0.5:50010, 
storageID=DS-1339564791-127.0.0.1-50010-1296151113818, infoPort=50075, 
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/10.120.0.5:50010 
remote=/10.120.0.5:42564]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)



You might have guessed that local machine is 10.120.0.5. Unsuprisingly, 
process on port 50010 is the datanode. Port 42564 is changing depending 
on the error instance, and seems to correspond to the regionserver 
process. If I ask for processes connected to port 50010 using an 'lsof 
-i :50010', I have an impressive number of sockets (#400). Is it normal ?

I need to add that current load (requests, IOs, CPU, ...) is rather slow.

I can't find any other error in namenode or regionserver logs.

All the best,

Frédéric.


Mime
View raw message