hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Exception in hbase 0.92. with DFS, - Bad connect ack
Date Thu, 23 Feb 2012 22:06:11 GMT
Check your switch/link/uplink utilization.
 
HDFS-941 might help. That is not in Hadoop 1.0 according to a cursory search over branch history
in the Git mirror.


As another datapoint, we see this in our production with a Hadoop that is much closer to CDH3;
but, we have some known issues with the network design in our legacy datacenters and plan
to resolve it with an eventual relocation. I'm also integrating HDFS-941.


Best regards,


    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)



----- Original Message -----
> From: Mikael Sitruk <mikael.sitruk@gmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Thursday, February 23, 2012 1:25 PM
> Subject: Exception in hbase 0.92. with DFS, - Bad connect ack
> 
> Hi
> 
> I see that i have in my hbase logs a lot of the following (target IP is
> changing)
> 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream 10.232.83.87:50010 java.io.IOException: Bad connect
> ack with firstBadLink as 10.232.83.118:50010
> 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_4678388308309640326_170570
> 2012-02-23 23:04:02,701 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
> datanode 10.232.83.118:50010
> 
> Then checking the hdfs log of the same server (87)
> 2012-02-23 23:04:02,698 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_4678388308309640326_170570 received exception
> java.net.SocketTimeoutException: 66000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/
> 10.232.83.118:50010]
> 2012-02-23 23:04:02,699 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.232.83.87:50010,
> storageID=DS-1257662823-10.232.83.87-50010-1329398253085, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 66000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/
> 10.232.83.118:50010]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:319)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
>         at java.lang.Thread.run(Thread.java:662)
> 
> 
> Looking at the target (118) server hdfs log does not seems to show any
> problem around the same time.
> 2012-02-23 23:04:01,648 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.232.83.118:45623, dest: /10.232.83.118:50010, bytes: 67108864, op:
> HDFS_WRITE, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset:
> 0, srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
> blk_-1747243057136009792_170577, duration: 6932047000
> 2012-02-23 23:04:01,649 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
> block blk_-1747243057136009792_170577 terminating
> 2012-02-23 23:04:01,656 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-4467275870825484381_170577 src: /10.232.83.118:45626 dest: /
> 10.232.83.118:50010
> 2012-02-23 23:04:03,467 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_6330134749736235430_170577 src: /10.232.83.114:49175 dest: /
> 10.232.83.118:50010
> 2012-02-23 23:04:05,153 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.232.83.118:50010, dest: /10.232.83.118:45615, bytes: 67633152, op:
> HDFS_READ, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset: 0,
> srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
> blk_-7285361301892533992_165555, duration: 27134342000
> 2012-02-23 23:04:08,569 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.232.83.118:45626, dest: /10.232.83.118:50010, bytes: 67108864, op:
> HDFS_WRITE, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset:
> 0, srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
> blk_-4467275870825484381_170577, duration: 6906584000
> 2012-02-23 23:04:08,570 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
> block blk_-4467275870825484381_170577 terminating
> 2012-02-23 23:04:08,572 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_6927577191995683160_170577 src: /10.232.83.118:45629 dest: /
> 10.232.83.118:50010
> 2012-02-23 23:04:09,283 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7440488846881064366_170577 src: /10.232.83.86:60436 dest: /
> 10.232.83.118:50010
> 
> I have checked gc logs, but no pauses where noted (all full gc pauses
> <10ms).
> 
> Any idea of what the problem can be?
> 
> I use HB: 0.92.0 and HDFS 1.0.0
> Thanks
> Mikael.S
> 

Mime
View raw message