Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7EA9C189FD for ; Fri, 24 Apr 2015 08:42:44 +0000 (UTC) Received: (qmail 88098 invoked by uid 500); 24 Apr 2015 08:42:26 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 87940 invoked by uid 500); 24 Apr 2015 08:42:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 87930 invoked by uid 99); 24 Apr 2015 08:42:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2015 08:42:25 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.164.171.186 which is an MX secondary for user@hadoop.apache.org) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2015 08:42:21 +0000 Received: from mail-oi0-f44.google.com (mail-oi0-f44.google.com [209.85.218.44]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id D4B5B43CB6 for ; Fri, 24 Apr 2015 08:42:00 +0000 (UTC) Received: by oign205 with SMTP id n205so35162123oig.2 for ; Fri, 24 Apr 2015 01:42:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=tUhLEsPV4iLgi3fUGiaZMPFHatOlUEgAf1mfk1bnFGg=; b=ARYf427YDYRc7mARHyu9a6kZJcMrT6LmpUnBVJZuDPwzuO6ep8ajLfzrw4eDpRcxOO zdoURgnJ6AXs+J/reRJvKRKzuwv7KR+F/0ju/90tjSQ8K0ng/gknanICw8PHliWvU4s4 rijHFk1p4hImjpqsZkhLyVNPdF1A2zLMWoqHvCUdh6ggP+zMSSPyNR9pOja1kInJyGzi HeTPHQDERJbPvE/FOarp9qM2vSTseJ9mDv7xmdNm5DWGbPNJr/sQ1kbaXdcqGisEZQEO Hvem3gUT2p1SsBuPfCpOVTjnm/9VG2nYVJuimQKOFAK70GLXXiHYc01CKwfAlXSE8Zsp /p4Q== X-Gm-Message-State: ALoCoQno4n9WsFEBWuGHSYB72sGrhvrPNTHorNzVObErnX1X7D4i4yNyyg+YWunp93WptUPoWFiB MIME-Version: 1.0 X-Received: by 10.182.48.231 with SMTP id p7mr6339670obn.19.1429864920034; Fri, 24 Apr 2015 01:42:00 -0700 (PDT) Received: by 10.182.38.135 with HTTP; Fri, 24 Apr 2015 01:41:59 -0700 (PDT) In-Reply-To: References: <5df31857ed01f8ba169af1555939dc@cweb09.nm.nhnsystem.com> <562b94464b725071e6e51fded2fdc6e4@cweb04.nm.nhnsystem.com> Date: Fri, 24 Apr 2015 17:41:59 +0900 Message-ID: Subject: Re: rolling upgrade(2.4.1 to 2.6.0) problem From: =?UTF-8?B?RHJha2Xrr7zsmIHqt7w=?= To: user , =?UTF-8?B?7KGw7KO87J28?= Content-Type: multipart/alternative; boundary=047d7b674400a2c3720514745ec5 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b674400a2c3720514745ec5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I think limited by "max user processes". see this: https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread In your case, user cannot create more than 10240 processes. In our env, the limit is more like "65000". I think it's worth a try. And, if hdfs datanode daemon's user is not root, set the limit file into /etc/security/limits.d Thanks. Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D kt NexR On Fri, Apr 24, 2015 at 5:15 PM, =EC=A1=B0=EC=A3=BC=EC=9D=BC wrote: > ulimit -a > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 62580 > > max locked memory (kbytes, -l) 64 > > max memory size (kbytes, -m) unlimited > > open files (-n) 102400 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 10240 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 10240 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > > > ------------------------------------------------------ > > Hadoop cluster was operating normally in the 2.4.1 version. > > Hadoop cluster is a problem in version 2.6. > > > > E.g > > > > Slow BlockReceiver logs are often seen > > "org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver writ= e > data to disk cost" > > > > If the data node failure and under-block occurs, > > another many nodes heartbeat check is fails. > > So, I stop all nodes and I start all nodes. > > The cluster is then normalized. > > > > In this regard, Hadoop Is there a difference between version 2.4 and 2.6? > > > > > > ulimit -a > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 62580 > > max locked memory (kbytes, -l) 64 > > max memory size (kbytes, -m) unlimited > > open files (-n) 102400 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 10240 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 10240 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > > > > > -----Original Message----- > *From:* "Drake=EB=AF=BC=EC=98=81=EA=B7=BC" > *To:* "user"; "=EC=A1=B0=EC=A3=BC=EC=9D=BC"; > *Cc:* > *Sent:* 2015-04-24 (=EA=B8=88) 16:58:46 > *Subject:* Re: rolling upgrade(2.4.1 to 2.6.0) problem > > HI, > > How about the ulimit setting of the user for hdfs datanode ? > > Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D > kt NexR > > On Wed, Apr 22, 2015 at 6:25 PM, =EC=A1=B0=EC=A3=BC=EC=9D=BC wrote: > > > > I allocated 5G. > > I think OOM is not the cause of essentially > > > > -----Original Message----- > *From:* "Han-Cheol Cho" > *To:* ; > *Cc:* > *Sent:* 2015-04-22 (=EC=88=98) 15:32:35 > *Subject:* RE: rolling upgrade(2.4.1 to 2.6.0) problem > > > Hi, > > > > The first warning shows out-of-memory error of JVM. > > Did you give enough max heap memory for DataNode daemons? > > DN daemons, by default, uses max heap size 1GB. So if your DN requires > more > > than that, it will be in a trouble. > > > You can check the memory consumption of you DN dameons (e.g., top > command) > > and the memory allocated to them by -Xmx option (e.g., jps -lmv). > > If the max heap size is too small, you can use HADOOP_DATANODE_OPTS > variable > > (e.g., HADOOP_DATANODE_OPTS=3D"-Xmx4g") to override it. > > > > Best wishes, > > Han-Cheol > > > > > > > > > > > > -----Original Message----- > *From:* "=EC=A1=B0=EC=A3=BC=EC=9D=BC" > *To:* ; > *Cc:* > *Sent:* 2015-04-22 (=EC=88=98) 14:54:16 > *Subject:* rolling upgrade(2.4.1 to 2.6.0) problem > > > > > My Cluster is.. > > hadoop 2.4.1 > > Capacity : 1.24PB > > Used 1.1PB > > 16 Datanodes > > Each node is a capacity of 65TB, 96TB, 80TB, Etc.. > > > > I had to proceed with the rolling upgrade 2.4.1 to 2.6.0. > > A data node upgraded takes about 40 minutes. > > Occurs during the upgrade is in progress under-block. > > > > 10 nodes completed upgrade 2.6.0. > > Had a problem at some point during a rolling upgrade of the remaining > nodes. > > > > Heartbeat of the many nodes(2.6.0 only) has failed. > > > > I did changes the following attributes but I did not fix the problem, > > dfs.datanode.handler.count =3D 100 ---> 300, 400, 500 > > dfs.datanode.max.transfer.threads =3D 4096 ---> 8000, 10000 > > > > I think, > > 1. Something that causes a delay in processing threads. I think it may be > because the block replication between different versions. > > 2. Whereby the many handlers and xceiver became necessary. > > 3. Whereby the out of memory, an error occurs. Or the problem arises on a > datanode. > > 4. Heartbeat fails, and datanode dies. > > > I found a datanode error log for the following: > > However, it is impossible to determine the cause. > > > > I think, therefore I am. Called because it blocks the replication between > different versions > > > > Give me someone help me !! > > > > DATANODE LOG > > -------------------------------------------------------------------------= - > > ### I had to check a few thousand close_wait connection from the datanode= . > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write > packet to mirror took 1207ms (threshold=3D300ms) > > > > 2015-04-21 22:46:01,772 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memor= y. > Will retry in 30 seconds. > > java.lang.OutOfMemoryError: unable to create new native thread > > at java.lang.Thread.start0(Native Method) > > at java.lang.Thread.start(Thread.java:640) > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverS= erver.java:145) > > at java.lang.Thread.run(Thread.java:662) > > 2015-04-21 22:49:45,378 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: > datanode-192.168.1.207:40010:DataXceiverServer:java.io.IOException: Xceiv= er > count 8193 exceeds the limit of concurrent xcievers: 8192 > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverS= erver.java:140) > > at java.lang.Thread.run(Thread.java:662) > > 2015-04-22 01:01:25,632 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: > datanode-192.168.1.207:40010:DataXceiverServer:java.io.IOException: Xceiv= er > count 8193 exceeds the limit of concurrent xcievers: 8192 > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverS= erver.java:140) > > at java.lang.Thread.run(Thread.java:662) > > 2015-04-22 03:49:44,125 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > datanode-192.168.1.204:40010:DataXceiver error processing READ_BLOCK > operation src: /192.168.2.174:45606 dst: /192.168.1.204:40010 > > java.io.IOException: cannot find BPOfferService for > bpid=3DBP-1770955034-0.0.0.0-1401163460236 > > at > org.apache.hadoop.hdfs.server.datanode.DataNode.getDNRegistrationForBP(Da= taNode.java:1387) > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.= java:470) > > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receive= r.java:116) > > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.= java:71) > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:2= 35) > > at java.lang.Thread.run(Thread.java:662) > > 2015-04-22 05:30:28,947 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(192.168.1.203, > datanodeUuid=3D654f22ef-84b3-4ecb-a959-2ea46d817c19, infoPort=3D40075, > ipcPort=3D40020, > storageInfo=3Dlv=3D-56;cid=3DCID-CLUSTER;nsid=3D239138164;c=3D14048838389= 82):Failed > to transfer BP-1770955034-0.0.0.0-1401163460236:blk_1075354042_1613403 to > 192.168.2.156:40010 got > > java.net.SocketException: Original Exception : java.io.IOException: > Connection reset by peer > > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > > at > sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:405) > > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:506= ) > > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStre= am.java:223) > > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender= .java:559) > > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.= java:728) > > at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode= .java:2017) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: java.io.IOException: Connection reset by peer > > ... 8 more > > > > > > > --047d7b674400a2c3720514745ec5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I think limited by "max user p= rocesses". see this:=C2=A0https://plumbr.eu/outofmemoryerror/un= able-to-create-new-native-thread In your case, user cannot create more = than 10240 processes. In our env, the limit is more like "65000".=

I think it's worth a try. And, if hdfs da= tanode daemon's user is not root, set the limit file into /etc/security= /limits.d

Thanks.

Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
kt= NexR

On Fri, Apr 24, 2015 at 5:15 PM, =EC=A1=B0= =EC=A3=BC=EC=9D=BC <tjstory@kgrid.co.kr> wrote:

u= limit -a

core file size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(blocks, -c= ) 0

data seg size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (kbytes, -d) unl= imited

scheduling priority =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = (-e) 0

file size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (bl= ocks, -f) unlimited

pending signals =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 (-i) 62580

max locked memory =C2=A0 =C2=A0 = =C2=A0 (kbytes, -l) 64

max memory size =C2=A0 =C2=A0 =C2=A0 =C2=A0 (k= bytes, -m) unlimited

open files =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(-n) 102400

pipe size =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(512 bytes, -p) 8

POSIX message qu= eues =C2=A0 =C2=A0 (bytes, -q) 819200

real-time priority =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(-r) 0

stack size =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(kbytes, -s) 10240

cpu time =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (seconds, -t) unlimited

ma= x user processes =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(-u) 10240=

virtual memory =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(kbytes, -v) unlimi= ted

file locks =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(-x) unlimited

=C2=A0

-------------------------------------= -----------------

Hadoop cluster was operating normally in the 2.4.1 version.=C2=A0<= /p>

Hadoop cluster is a problem in version 2.6.=C2=A0

=C2=A0

= E.g

=C2=A0

Slow BlockReceiver logs are often seen=C2=A0

&= quot;org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver wr= ite data to disk cost"=C2=A0

=C2=A0

If the data node failu= re and under-block=C2=A0occurs,=C2=A0=C2=A0

another many nodes heartb= eat check is fails.=C2=A0

So, I stop all nodes and I start all nodes.= =C2=A0

The cluster is then normalized.=C2=A0

=C2=A0

In th= is regard, Hadoop Is there a difference between version 2.4 and 2.6?=C2=A0<= /p>

=C2=A0

=C2=A0

ulimit -a

core file size =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(blocks, -c) 0

data seg size =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 (kbytes, -d) unlimited

scheduling priority =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (-e) 0

file size =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (blocks, -f) unlimited

pending si= gnals =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (-i) 62580

max locked memory =C2=A0 =C2=A0 =C2=A0 (kbytes, -l) 64

max memory= size =C2=A0 =C2=A0 =C2=A0 =C2=A0 (kbytes, -m) unlimited

open files = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0(-n) 102400

pipe size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(51= 2 bytes, -p) 8

POSIX message queues =C2=A0 =C2=A0 (bytes, -q) 819200<= /p>

real-time priority =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(-= r) 0

stack size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(kbyt= es, -s) 10240

cpu time =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 (seconds, -t) unlimited

max user processes =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0(-u) 10240

virtual memory =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0(kbytes, -v) unlimited

file locks =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(-x) unlimite= d

=C2=A0

=C2=A0

--= ---Original Message-----
From: "Drake=EB=AF=BC=EC=98= =81=EA=B7=BC"<drake.min@nexr.com>
To: "user"<user@hadoop.apache.org>; "=EC=A1=B0=EC=A3=BC=EC=9D=BC"<tjstory@kgrid.co.kr>;
Cc:=
Sent: 2015-04-24 (=EA=B8=88) 16:58:46
Subject: Re: ro= lling upgrade(2.4.1 to 2.6.0) problem
=C2=A0

HI,=C2= =A0
=C2=A0
How about the ulimit setting of the user for hdfs = datanode ?

Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
kt NexR=

On Wed, Apr 22, 2015 at 6:25 PM, =EC=A1=B0=EC=A3=BC=EC=9D=BC <t= jstory@kgrid.co.kr> wrote:

=C2=A0

I allocated 5G.=C2=A0

I thin= k OOM is not the cause of essentially=C2=A0

=C2=A0

-----Original Message-----
From: "Han-Cheol= Cho"<hancheol.cho@nhn-playart.com>
To: <user@hadoop.apache.org>= ;;
Cc: =C2=A0

Sent: 2015-04-22 (=EC= =88=98) 15:32:35
Subject: RE: rolling upgrade(2.4.1 to 2.6.0) pro= blem
=C2=A0

Hi,

=C2=A0

The first warning shows out-of-memory error of JVM.

Did= you give enough max heap memory for DataNode daemons?

DN daemons, by default, uses max heap size 1GB. So= if your DN requires more=C2=A0

than that, it will be in a trouble.

=C2=A0

You can check the memory consumption of=C2= =A0you DN dameons (e.g., top command)=C2= =A0

and the mem= ory allocated to them by -Xmx option (e.g., jps -lmv).

If the max heap size is too small, = you can use HADOOP_DATANODE_OPTS variable

(e.g., HADOOP_DATANODE_OPTS=3D"-Xmx4g")=C2=A0= to override it.

=C2=A0

Best wishes,

Han-Cheol

=C2=A0

=C2=A0=

=C2=A0

= =C2=A0

=C2=A0

-----Original Message-----=
From: "=EC=A1=B0=EC=A3=BC=EC=9D=BC"<tjstory@kgrid.co.kr>
<= b>To: <u= ser@hadoop.apache.org>;
Cc:
Sent: 2015-04-22 (= =EC=88=98) 14:54:16
Subject: rolling upgrade(2.4.1 to 2.6.0) prob= lem
=C2=A0

=C2=A0

My Cluster is..

hadoop 2.4.1

Capacity : 1.24PB

Used = 1.1PB

16 Datanodes=C2=A0

Each node is a capacity of 65TB, 96TB,= 80TB, Etc..

=C2=A0

I had to proceed with the rolling upgrade 2= .4.1 to 2.6.0.=C2=A0

A data node upgraded takes about 40 minutes.=C2= =A0

Occurs during the upgrade is in progress under-block.=C2=A0

=C2=A0

10 nodes completed upgrade 2.6.0.=C2=A0=C2=A0

Had a pro= blem at some point during a rolling upgrade of the remaining nodes.

= =C2=A0

Heartbeat of the many nodes(2.6.0 only) has failed.=C2=A0

<= p>=C2=A0

I did changes the following attributes but I did not fix the= problem,=C2=A0=C2=A0

dfs.datanode.handler= .count =3D 100 ---> 300, 400, 500=C2=A0=C2=A0

dfs.datanode.max.transfer.threads =3D 4096 ---> 8000, 10000=C2=A0=

=C2=A0

I think,=C2=A0

1. Something that causes a delay i= n processing threads.=C2=A0I = think it may be because the block replication between different versions.

2. Whereby the many handlers and xceiver became necessary.=C2= =A0

3.=C2=A0Whereby the= =C2=A0out of memory, a= n error occurs. Or the problem arises on a datanode.

4. Heartb= eat fails, and datanode dies.

=C2=A0

I found a datanode err= or log for the following:=C2=A0

However, it is impossible to determin= e the cause.=C2=A0

=C2=A0

I think, therefore I am. Called becau= se it blocks the replication between different versions=C2=A0

=C2=A0<= /p>

Give me someone help me !!=C2=A0

=C2=A0

DATANODE LOG

<= p>-------------------------------------------------------------------------= -

### I had to check a few thousand close_wait connection from the da= tanode.

=C2=A0

org.apache.hadoop.hdfs.s= erver.datanode.DataNode: Slow BlockReceiver write packet to mirror took 120= 7ms (threshold=3D300ms)

=C2=A0

2015-04-21 22:46:01,772 W= ARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of mem= ory. Will retry in 30 seconds.

java.lang.OutOfMemoryError: unable to = create new native thread

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thr= ead.start0(Native Method)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Th= read.start(Thread.java:640)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache= .hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:1= 45)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:6= 62)

2015-04-21 22:49:45,378 WARN org.apache.hadoop.hdfs.server.datano= de.DataNode: datanode-192.168.1.207:40010:DataXceiverServer:java.io.IOExcep= tion: Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192

=

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datanode.Da= taXceiverServer.run(DataXceiverServer.java:140)

=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at java.lang.Thread.run(Thread.java:662)

2015-04-22 01:01:25,632 WARN org.apache.hadoop.hdf= s.server.datanode.DataNode: datanode-192.168.1.207:40010:DataXceiverServer:= java.io.IOException: Xceiver count 8193 exceeds the limit of concurrent xci= evers: 8192

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hdfs.ser= ver.datanode.DataXceiverServer.run(DataXceiverServer.java:140)

=C2=A0= =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:662)

2015-0= 4-22 03:49:44,125 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: da= tanode-192.168.1.204:40010:DataXceiver error processing READ_BLOCK operatio= n =C2=A0src: /192.= 168.2.174:45606 dst: /192.168.1.204:40010

java.io.IOException: cannot find BPOf= ferService for bpid=3DBP-1770955034-0.0.0.0-1401163460236

=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datanode.DataNode.getDNR= egistrationForBP(DataNode.java:1387)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at o= rg.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.jav= a:470)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hdfs.protocol= .datatransfer.Receiver.opReadBlock(Receiver.java:116)

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.proc= essOp(Receiver.java:71)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.had= oop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)

=C2=A0= =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:662)

2015-0= 4-22 05:30:28,947 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Dat= anodeRegistration(192.168.1.203, datanodeUuid=3D654f22ef-84b3-4ecb-a959-2ea= 46d817c19, infoPort=3D40075, ipcPort=3D40020, storageInfo=3Dlv=3D-56;cid=3D= CID-CLUSTER;nsid=3D239138164;c=3D1404883838982):Failed to transfer BP-17709= 55034-0.0.0.0-1401163460236:blk_1075354042_1613403 to 192.168.2.156:40010 got

java.= net.SocketException: Original Exception : java.io.IOException: Connection r= eset by peer

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.nio.ch.FileChannelImp= l.transferTo0(Native Method)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.nio.c= h.FileChannelImpl.transferToDirectly(FileChannelImpl.java:405)

=C2=A0= =C2=A0 =C2=A0 =C2=A0 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelI= mpl.java:506)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.net.So= cketOutputStream.transferToFully(SocketOutputStream.java:223)

=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datanode.BlockSender.= sendPacket(BlockSender.java:559)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.a= pache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:72= 8)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datan= ode.DataNode$DataTransfer.run(DataNode.java:2017)

=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at java.lang.Thread.run(Thread.java:662)

Caused by: java.i= o.IOException: Connection reset by peer

=C2=A0 =C2=A0 =C2=A0 =C2=A0 .= .. 8 more

=C2=A0

=C2=A0

=C2=A0

--047d7b674400a2c3720514745ec5--