hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Musa <pa...@psafe.com>
Subject Re: DataXceiver error processing WRITE_BLOCK operation src: /x.x.x.x:50373 dest: /x.x.x.x:50010
Date Sun, 10 Mar 2013 18:33:31 GMT
This variable was already set:
<property>
   <name>dfs.datanode.max.xcievers</name>
   <value>4096</value>
   <final>true</final>
</property>

Should I increase it more?

Same error happening every 5-8 minutes in the datanode 172.17.2.18.

2013-03-10 15:26:42,818 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
PSLBHDN002:50010:DataXceiver error processing READ_BLOCK operation  src: 
/172.17.2.18:46422 dest: /172.17.2.18:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 
remote=/172.17.2.18:46422]


]$ lsof | wc -l
2393

]$ lsof | grep hbase | wc -l
4

]$ lsof | grep hdfs | wc -l
322

]$ lsof | grep hadoop | wc -l
162

]$ cat /proc/sys/fs/file-nr
4416    0    7327615

]$ date
Sun Mar 10 15:31:47 BRT 2013


What can be the causes? How could I extract more info about the error?

Thanks,
Pablo


On 03/08/2013 09:57 PM, Abdelrahman Shettia wrote:
> Hi,
>
> If all of the # of open files limit ( hbase , and hdfs : users ) are 
> set to more than 30 K. Please change the dfs.datanode.max.xcievers to 
> more than the value below.
>
> <property>
>
>    <name>dfs.datanode.max.xcievers</name>
>
>    <value>2096</value>
>
>        <description>PRIVATE CONFIG VARIABLE</description>
>
>              </property>
>
> Try to increase this one and tunne it to the hbase usage.
>
>
> Thanks
>
> -Abdelrahman
>
>
>
>
>
>
> On Fri, Mar 8, 2013 at 9:28 AM, Pablo Musa <pablo@psafe.com 
> <mailto:pablo@psafe.com>> wrote:
>
>     I am also having this issue and tried a lot of solutions, but
>     could not solve it.
>
>     ]# ulimit -n ** running as root and hdfs (datanode user)
>     32768
>
>     ]# cat /proc/sys/fs/file-nr
>     2080    0    8047008
>
>     ]# lsof | wc -l
>     5157
>
>     Sometimes this issue happens from one node to the same node :(
>
>     I also think this issue is messing with my regionservers which are
>     crashing all day long!!
>
>     Thanks,
>     Pablo
>
>
>     On 03/08/2013 06:42 AM, Dhanasekaran Anbalagan wrote:
>>     Hi Varun
>>
>>     I believe is not ulimit issue.
>>
>>
>>     /etc/security/limits.conf
>>     # End of file
>>     *               -      nofile  1000000
>>     *               -      nproc 1000000
>>
>>
>>     please guide me Guys, I want fix this. share your
>>     thoughts DataXceiver error.
>>
>>     Did I learn something today? If not, I wasted it.
>>
>>
>>     On Fri, Mar 8, 2013 at 3:50 AM, varun kumar <varun.uid@gmail.com
>>     <mailto:varun.uid@gmail.com>> wrote:
>>
>>         Hi Dhana,
>>
>>         Increase the ulimit for all the datanodes.
>>
>>         If you are starting the service using hadoop increase the
>>         ulimit value for hadoop user.
>>
>>         Do the  changes in the following file.
>>
>>         */etc/security/limits.conf*
>>
>>         Example:-
>>         *hadoop          soft    nofile    35000*
>>         *hadoop          hard    nofile    35000*
>>
>>         Regards,
>>         Varun Kumar.P
>>
>>         On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan
>>         <bugcy013@gmail.com <mailto:bugcy013@gmail.com>> wrote:
>>
>>             Hi Guys
>>
>>             I am frequently getting is error in my Data nodes.
>>
>>             Please guide what is the exact problem this.
>>
>>             dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation
src: /172.16.30.138:50373  <http://172.16.30.138:50373>  dest: /172.16.30.138:50010
 <http://172.16.30.138:50010>
>>
>>
>>
>>             java.net.SocketTimeoutException: 70000 millis timeout while waiting for
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.30.138:34280
 <http://172.16.30.138:34280>  remote=/172.16.30.140:50010  <http://172.16.30.140:50010>]
>>
>>
>>
>>
>>
>>             at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>             at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154)
>>             at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127)
>>
>>
>>
>>
>>
>>             at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115)
>>             at java.io.FilterInputStream.read(FilterInputStream.java:66)
>>             at java.io.FilterInputStream.read(FilterInputStream.java:66)
>>             at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:160)
>>
>>
>>
>>
>>
>>             at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:405)
>>             at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>>             at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
>>
>>
>>
>>
>>
>>             at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189)
>>             at java.lang.Thread.run(Thread.java:662)
>>
>>             dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK operation
src: /172.16.30.138:50531  <http://172.16.30.138:50531>  dest: /172.16.30.138:50010
 <http://172.16.30.138:50010>
>>
>>
>>
>>             java.io.EOFException: while trying to read 65563 bytes
>>
>>
>>             at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:408)
>>             at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:452)
>>             at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:511)
>>
>>
>>
>>
>>
>>             at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:748)
>>             at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:462)
>>             at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>>
>>
>>
>>
>>
>>             at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
>>             at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189)
>>             at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>             How to resolve this.
>>
>>             -Dhanasekaran.
>>
>>             Did I learn something today? If not, I wasted it.
>>
>>             -- 
>>
>>
>>
>>
>>
>>
>>         -- 
>>         Regards,
>>         Varun Kumar.P
>>
>>
>
>


Mime
View raw message