Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7380A97E4 for ; Sun, 10 Mar 2013 18:34:17 +0000 (UTC) Received: (qmail 44249 invoked by uid 500); 10 Mar 2013 18:34:11 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 44160 invoked by uid 500); 10 Mar 2013 18:34:11 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 44148 invoked by uid 99); 10 Mar 2013 18:34:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Mar 2013 18:34:11 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=GAPPY_SUBJECT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pablo@psafe.com designates 187.0.212.22 as permitted sender) Received: from [187.0.212.22] (HELO aherelay01.exch.emailtotal.com.br) (187.0.212.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Mar 2013 18:34:03 +0000 Received: from exchange.emailtotal.com.br (unknown [187.0.212.6]) by aherelay01.exch.emailtotal.com.br (Postfix) with ESMTP id D0C531820C for ; Sun, 10 Mar 2013 15:33:39 -0300 (BRT) Received: from [192.168.1.101] (177.135.131.66) by exchange.emailtotal.com.br (187.0.212.17) with Microsoft SMTP Server (TLS) id 8.3.83.0; Sun, 10 Mar 2013 15:33:27 -0300 Message-ID: <513CD1FB.7090409@psafe.com> Date: Sun, 10 Mar 2013 15:33:31 -0300 From: Pablo Musa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: "user@hadoop.apache.org" Subject: Re: DataXceiver error processing WRITE_BLOCK operation src: /x.x.x.x:50373 dest: /x.x.x.x:50010 References: <513A1FAB.7060003@psafe.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------070702020305020709000302" X-Virus-Checked: Checked by ClamAV on apache.org --------------070702020305020709000302 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit This variable was already set: dfs.datanode.max.xcievers 4096 true Should I increase it more? Same error happening every 5-8 minutes in the datanode 172.17.2.18. 2013-03-10 15:26:42,818 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: PSLBHDN002:50010:DataXceiver error processing READ_BLOCK operation src: /172.17.2.18:46422 dest: /172.17.2.18:50010 java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 remote=/172.17.2.18:46422] ]$ lsof | wc -l 2393 ]$ lsof | grep hbase | wc -l 4 ]$ lsof | grep hdfs | wc -l 322 ]$ lsof | grep hadoop | wc -l 162 ]$ cat /proc/sys/fs/file-nr 4416 0 7327615 ]$ date Sun Mar 10 15:31:47 BRT 2013 What can be the causes? How could I extract more info about the error? Thanks, Pablo On 03/08/2013 09:57 PM, Abdelrahman Shettia wrote: > Hi, > > If all of the # of open files limit ( hbase , and hdfs : users ) are > set to more than 30 K. Please change the dfs.datanode.max.xcievers to > more than the value below. > > > > dfs.datanode.max.xcievers > > 2096 > > PRIVATE CONFIG VARIABLE > > > > Try to increase this one and tunne it to the hbase usage. > > > Thanks > > -Abdelrahman > > > > > > > On Fri, Mar 8, 2013 at 9:28 AM, Pablo Musa > wrote: > > I am also having this issue and tried a lot of solutions, but > could not solve it. > > ]# ulimit -n ** running as root and hdfs (datanode user) > 32768 > > ]# cat /proc/sys/fs/file-nr > 2080 0 8047008 > > ]# lsof | wc -l > 5157 > > Sometimes this issue happens from one node to the same node :( > > I also think this issue is messing with my regionservers which are > crashing all day long!! > > Thanks, > Pablo > > > On 03/08/2013 06:42 AM, Dhanasekaran Anbalagan wrote: >> Hi Varun >> >> I believe is not ulimit issue. >> >> >> /etc/security/limits.conf >> # End of file >> * - nofile 1000000 >> * - nproc 1000000 >> >> >> please guide me Guys, I want fix this. share your >> thoughts DataXceiver error. >> >> Did I learn something today? If not, I wasted it. >> >> >> On Fri, Mar 8, 2013 at 3:50 AM, varun kumar > > wrote: >> >> Hi Dhana, >> >> Increase the ulimit for all the datanodes. >> >> If you are starting the service using hadoop increase the >> ulimit value for hadoop user. >> >> Do the changes in the following file. >> >> */etc/security/limits.conf* >> >> Example:- >> *hadoop soft nofile 35000* >> *hadoop hard nofile 35000* >> >> Regards, >> Varun Kumar.P >> >> On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan >> > wrote: >> >> Hi Guys >> >> I am frequently getting is error in my Data nodes. >> >> Please guide what is the exact problem this. >> >> dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50373 dest: /172.16.30.138:50010 >> >> >> >> java.net.SocketTimeoutException: 70000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.30.138:34280 remote=/172.16.30.140:50010 ] >> >> >> >> >> >> at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) >> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154) >> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127) >> >> >> >> >> >> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115) >> at java.io.FilterInputStream.read(FilterInputStream.java:66) >> at java.io.FilterInputStream.read(FilterInputStream.java:66) >> at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:160) >> >> >> >> >> >> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:405) >> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) >> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) >> >> >> >> >> >> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) >> at java.lang.Thread.run(Thread.java:662) >> >> dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.16.30.138:50531 dest: /172.16.30.138:50010 >> >> >> >> java.io.EOFException: while trying to read 65563 bytes >> >> >> at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:408) >> at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:452) >> at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:511) >> >> >> >> >> >> at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:748) >> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:462) >> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) >> >> >> >> >> >> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) >> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) >> at java.lang.Thread.run(Thread.java:662) >> >> >> >> How to resolve this. >> >> -Dhanasekaran. >> >> Did I learn something today? If not, I wasted it. >> >> -- >> >> >> >> >> >> >> -- >> Regards, >> Varun Kumar.P >> >> > > --------------070702020305020709000302 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable This variable was already set:
<property>
  <name>dfs.datanode.max.xcievers</name>
  <value>4096</value>
  <final>true</final>
</property>

Should I increase it more?

Same error happening every 5-8 minutes in the datanode 172.17.2.18.

2013-03-10 15:26:42,818 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: PSLBHDN002:50010:DataXceiver error processing READ_BLOCK operation  src: /172.17.2.18:46422 dest: /172.17.2.18:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=3D/172.17.2.18:50010 remote=3D/172.17.2.18:46422]


]$ lsof | wc -l
2393

]$ lsof | grep hbase | wc -l
4

]$ lsof | grep hdfs | wc -l
322

]$ lsof | grep hadoop | wc -l
162

]$ cat /proc/sys/fs/file-nr
4416    0    7327615

]$ date
Sun Mar 10 15:31:47 BRT 2013


What can be the causes? How could I extract more info about the error?

Thanks,
Pablo


On 03/08/2013 09:57 PM, Abdelrahman Shettia wrote:
Hi, 

If all of the # of open files limit ( hbase , and hdfs : users ) are set to more than 30 K. Please change the dfs.datanode.max.xcievers to more than the value below.&= nbsp; 

<property>

   <name>dfs.datanode.max.xcie= vers</name>

   <value>2096</value>

       <descr= iption>PRIVATE CONFIG VARIABLE</description>

        &nb= sp;    </property>

Try to increase this one and tunne it to the hbase usage. 


Thanks

-Abdelrahman






On Fri, Mar 8, 2013 at 9:28 AM, Pablo Musa <pablo@psafe= .com> wrote:
I am also having this issue and tried a lot of solutions, but could not solve it.

]# ulimit -n ** running as root and hdfs (datanode user)
32768

]# cat /proc/sys/fs/file-nr
2080    0    8047008

]# lsof | wc -l
5157

Sometimes this issue happens from one node to the same node :(

I also think this issue is messing with my regionservers which are crashing all day long!!

Thanks,
Pablo


On 03/08/2013 06:42 AM, Dhanasekaran Anbalagan wrote:
Hi Varun

I believe is not ulimit issue.


/etc/security/limits.conf
# End of file
*             &n= bsp; -      nofile          1000000
*             &n= bsp; -      nproc           1000000


please guide me Guys, I want fix this. share your thoughts DataXceiver error. =

Did I learn something today? If not, I wasted it.


On Fri, Mar 8, 2013 at 3:50 AM, varun kumar <varun.uid@gmail.com> wrote:
Hi Dhana,

Increase the ulimit for all the datanodes.

If you are starting the service using hadoop increase the ulimit value for hadoop user.

Do the  changes in the following file= .

/etc/security/limits.conf

Example:-
hadoop         &nbs= p;soft    nofile          35000
hadoop         &nbs= p;hard    nofile          35000

Regards,
Varun Kumar.P

On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan <bugcy013@gmail.com<= /a>> wrote:
Hi Guys

I am frequently gettin= g is error in my Data nodes.

Please guide what is the exact problem this.

dvcliftonhera138:50010:DataXceive=
r error processing WRITE_BLOCK operation src: /172.16.30.138:50373 dest: /172.16.30.138:50010



java.net.SocketTimeoutException: 70000 millis timeout while waiting for cha=
nnel to be ready for read. ch : java.nio.channels.SocketChannel[connected l=
ocal=3D/172.16.30.138:34280 remote=3D/172.16.30.140:50010=
]





at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:=
164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127)





at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil=
.java:160)





at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceive=
r.java:405)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Recei=
ver.java:98)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver=
.java:66)





at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:=
189)
at java.lang.Thread.run(Thread.java:662)

                                        
dvcliftonhera138:50010:DataXceiver error processi=
ng WRITE_BLOCK operation src: /172.16.30.138:50531 dest: /172.16.30.138:50010



java.io.EOFException: while trying to read 65563 bytes


at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRece=
iver.java:408)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Bloc=
kReceiver.java:452)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Block=
Receiver.java:511)





at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockR=
eceiver.java:748)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceive=
r.java:462)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Recei=
ver.java:98)





at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver=
.java:66)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:=
189)
at java.lang.Thread.run(Thread.java:662)


How to resolve this.

-Dhanasekaran.

Did I learn something today? If not, I wasted it.

--
 
 
 



--
Regard= s,
Varun Kumar.P




--------------070702020305020709000302--