hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 조주일 <tjst...@kgrid.co.kr>
Subject rolling upgrade(2.4.1 to 2.6.0) problem
Date Wed, 22 Apr 2015 05:54:16 GMT
 
My Cluster is..
hadoop 2.4.1
Capacity : 1.24PB
Used 1.1PB
16 Datanodes 
Each node is a capacity of 65TB, 96TB, 80TB, Etc..
 
I had to proceed with the rolling upgrade 2.4.1 to 2.6.0. 
A data node upgraded takes about 40 minutes. 
Occurs during the upgrade is in progress under-block. 
 
10 nodes completed upgrade 2.6.0.  
Had a problem at some point during a rolling upgrade of the remaining nodes.
 
Heartbeat of the many nodes(2.6.0 only) has failed. 
 
I did changes the following attributes but I did not fix the problem,  
dfs.datanode.handler.count = 100 ---&gt; 300, 400, 500  
dfs.datanode.max.transfer.threads = 4096 ---&gt; 8000, 10000 
 
I think, 
1. Something that causes a delay in processing threads. I think it may be because the block
replication between different versions.
2. Whereby the many handlers and xceiver became necessary. 
3. Whereby the out of memory, an error occurs. Or the problem arises on a datanode.
4. Heartbeat fails, and datanode dies.

I found a datanode error log for the following: 
However, it is impossible to determine the cause. 
 
I think, therefore I am. Called because it blocks the replication between different versions

 
Give me someone help me !! 
 
DATANODE LOG
--------------------------------------------------------------------------
### I had to check a few thousand close_wait connection from the datanode.
 
org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror
took 1207ms (threshold=300ms)
 
2015-04-21 22:46:01,772 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is
out of memory. Will retry in 30 seconds.
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:640)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:145)
        at java.lang.Thread.run(Thread.java:662)
		
		
2015-04-21 22:49:45,378 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: datanode-192.168.1.207:40010:DataXceiverServer:java.io.IOException:
Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:140)
        at java.lang.Thread.run(Thread.java:662)		
		
		
2015-04-22 01:01:25,632 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: datanode-192.168.1.207:40010:DataXceiverServer:java.io.IOException:
Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:140)
        at java.lang.Thread.run(Thread.java:662)
		
		
2015-04-22 03:49:44,125 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: datanode-192.168.1.204:40010:DataXceiver
error processing READ_BLOCK operation  src: /192.168.2.174:45606 dst: /192.168.1.204:40010
java.io.IOException: cannot find BPOfferService for bpid=BP-1770955034-0.0.0.0-1401163460236
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getDNRegistrationForBP(DataNode.java:1387)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:470)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
        at java.lang.Thread.run(Thread.java:662)
		
		
2015-04-22 05:30:28,947 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.1.203,
datanodeUuid=654f22ef-84b3-4ecb-a959-2ea46d817c19, infoPort=40075, ipcPort=40020, storageInfo=lv=-56;cid=CID-CLUSTER;nsid=239138164;c=1404883838982):Failed
to transfer BP-1770955034-0.0.0.0-1401163460236:blk_1075354042_1613403 to 192.168.2.156:40010
got
java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
        at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:405)
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:506)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:559)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:728)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2017)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
        ... 8 more
 
 
Mime
View raw message