hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Musa <pa...@psafe.com>
Subject Re: DataXceiver error processing WRITE_BLOCK operation src: /x.x.x.x:50373 dest: /x.x.x.x:50010
Date Tue, 12 Mar 2013 15:52:11 GMT
I am having some GC pauses (70 secs) but I don't think this could cause 
480 secs
timeout. And its even more weird when it happens from one datanode to 
ITSELF.

 > Socket is ready for receiving, but client closed abnormally. so you 
generally got this error.

What would abnormally be in this case?

 > xcievers : 4096 is enough, and I don't think you pasted a full stack 
exception.

Follows.

Thanks very much for the help,
Pablo Musa

2013-03-12 09:41:52,779 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.17.2.18:50010, dest: /172.17.2.18:43364, bytes: 66564, op: 
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1549283955_26, offset: 
66393088, srvID: DS-229334310-172.17.2.18-50010-1328651636364, blockid: 
BP-43236042-172.17.2.10-1362490844340:blk_7228654423351524558_25176577, 
duration: 24309480
2013-03-12 09:41:52,810 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.17.2.18:50010, dest: /172.17.2.18:43364, bytes: 66564, op: 
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1549283955_26, offset: 
66458624, srvID: DS-229334310-172.17.2.18-50010-1328651636364, blockid: 
BP-43236042-172.17.2.10-1362490844340:blk_7228654423351524558_25176577, 
duration: 24791908

...

2013-03-12 11:57:54,176 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.17.2.18:50010, dest: /172.17.2.18:45037, bytes: 66564, op: 
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1549283955_26, offset: 2755072, 
srvID: DS-229334310-172.17.2.18-50010-1328651636364, blockid: 
BP-43236042-172.17.2.10-1362490844340:blk_7228654423351524558_25176577, 
duration: 26533296

...

2013-03-12 12:12:56,524 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_6121120387190865802_12522001
2013-03-12 12:12:56,844 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_7798078179913116741_9709757
2013-03-12 12:12:57,412 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: exception:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 
remote=/172.17.2.18:45063]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:510)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:344)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
         at java.lang.Thread.run(Thread.java:722)
2013-03-12 12:12:57,412 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.17.2.18:50010, dest: /172.17.2.18:45063, bytes: 594432, op: 
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1549283955_26, offset: 2886144, 
srvID: DS-229334310-172.17.2.18-50010-1328651636364, blockid: 
BP-43236042-172.17.2.10-1362490844340:blk_7228654423351524558_25176577, 
duration: 480311786486
2013-03-12 12:12:57,412 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.17.2.18, 
storageID=DS-229334310-172.17.2.18-50010-1328651636364, infoPort=50075, 
ipcPort=50020, 
storageInfo=lv=-40;cid=CID-26cd999e-460a-4dbc-b940-9250a76930a8;nsid=276058127;c=1362491004838):Got

exception while serving 
BP-43236042-172.17.2.10-1362490844340:blk_7228654423351524558_25176577 
to /172.17.2.18:45063
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 
remote=/172.17.2.18:45063]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:510)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:344)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
         at java.lang.Thread.run(Thread.java:722)
2013-03-12 12:12:57,412 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
PSLBHDN002:50010:DataXceiver error processing READ_BLOCK operation src: 
/172.17.2.18:45063 dest: /172.17.2.18:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 
remote=/172.17.2.18:45063]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:510)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:344)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
         at java.lang.Thread.run(Thread.java:722)
2013-03-12 12:12:58,043 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_8022508854015956034_21426598
2013-03-12 12:12:58,069 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_-5102464265454077361_17771877
2013-03-12 12:12:58,443 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_-7350069832338632205_21397596

...

2013-03-12 12:37:21,267 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_9101061522956099413_17372672
2013-03-12 12:37:21,298 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_-2427596758655123110_10847650
2013-03-12 12:37:21,310 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_5607661776053432519_17155914
2013-03-12 12:37:21,323 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: exception:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 
remote=/172.17.2.18:45213]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:510)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:344)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
         at java.lang.Thread.run(Thread.java:722)
2013-03-12 12:37:21,323 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.17.2.18:50010, dest: /172.17.2.18:45213, bytes: 528384, op: 
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1549283955_26, offset: 9052672, 
srvID: DS-229334310-172.17.2.18-50010-1328651636364, blockid: 
BP-43236042-172.17.2.10-1362490844340:blk_7228654423351524558_25176577, 
duration: 480102794116
2013-03-12 12:37:21,323 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.17.2.18, 
storageID=DS-229334310-172.17.2.18-50010-1328651636364, infoPort=50075, 
ipcPort=50020, 
storageInfo=lv=-40;cid=CID-26cd999e-460a-4dbc-b940-9250a76930a8;nsid=276058127;c=1362491004838):Got

exception while serving 
BP-43236042-172.17.2.10-1362490844340:blk_7228654423351524558_25176577 
to /172.17.2.18:45213
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 
remote=/172.17.2.18:45213]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:510)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:344)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
         at java.lang.Thread.run(Thread.java:722)
2013-03-12 12:37:21,323 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
PSLBHDN002:50010:DataXceiver error processing READ_BLOCK operation src: 
/172.17.2.18:45213 dest: /172.17.2.18:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010 
remote=/172.17.2.18:45213]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:510)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:344)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
         at java.lang.Thread.run(Thread.java:722)
2013-03-12 12:37:21,326 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_-6873281681928280553_12192883
2013-03-12 12:37:21,342 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
Verification succeeded for 
BP-43236042-172.17.2.10-1362490844340:blk_-6420939594294632128_2665052



On 03/10/2013 11:23 PM, Azuryy Yu wrote:
> xcievers : 4096 is enough, and I don't think you pasted a full stack 
> exception.
> Socket is ready for receiving, but client closed abnormally. so you 
> generally got this error.
>
>
> On Mon, Mar 11, 2013 at 2:33 AM, Pablo Musa <pablo@psafe.com 
> <mailto:pablo@psafe.com>> wrote:
>
>     This variable was already set:
>     <property>
>       <name>dfs.datanode.max.xcievers</name>
>       <value>4096</value>
>       <final>true</final>
>     </property>
>
>     Should I increase it more?
>
>     Same error happening every 5-8 minutes in the datanode 172.17.2.18.
>
>     2013-03-10 15:26:42,818 ERROR
>     org.apache.hadoop.hdfs.server.datanode.DataNode:
>     PSLBHDN002:50010:DataXceiver error processing READ_BLOCK
>     operation  src: /172.17.2.18:46422 <http://172.17.2.18:46422>
>     dest: /172.17.2.18:50010 <http://172.17.2.18:50010>
>     java.net.SocketTimeoutException: 480000 millis timeout while
>     waiting for channel to be ready for write. ch :
>     java.nio.channels.SocketChannel[connected local=/172.17.2.18:50010
>     <http://172.17.2.18:50010> remote=/172.17.2.18:46422
>     <http://172.17.2.18:46422>]
>
>
>     ]$ lsof | wc -l
>     2393
>
>     ]$ lsof | grep hbase | wc -l
>     4
>
>     ]$ lsof | grep hdfs | wc -l
>     322
>
>     ]$ lsof | grep hadoop | wc -l
>     162
>
>     ]$ cat /proc/sys/fs/file-nr
>     4416    0    7327615
>
>     ]$ date
>     Sun Mar 10 15:31:47 BRT 2013
>
>
>     What can be the causes? How could I extract more info about the error?
>
>     Thanks,
>     Pablo
>
>
>     On 03/08/2013 09:57 PM, Abdelrahman Shettia wrote:
>>     Hi,
>>
>>     If all of the # of open files limit ( hbase , and hdfs : users )
>>     are set to more than 30 K. Please change
>>     the dfs.datanode.max.xcievers to more than the value below.
>>
>>     <property>
>>
>>        <name>dfs.datanode.max.xcievers</name>
>>
>>        <value>2096</value>
>>
>>            <description>PRIVATE CONFIG VARIABLE</description>
>>
>>                  </property>
>>
>>     Try to increase this one and tunne it to the hbase usage.
>>
>>
>>     Thanks
>>
>>     -Abdelrahman
>>
>>
>>
>>
>>
>>
>>     On Fri, Mar 8, 2013 at 9:28 AM, Pablo Musa <pablo@psafe.com
>>     <mailto:pablo@psafe.com>> wrote:
>>
>>         I am also having this issue and tried a lot of solutions, but
>>         could not solve it.
>>
>>         ]# ulimit -n ** running as root and hdfs (datanode user)
>>         32768
>>
>>         ]# cat /proc/sys/fs/file-nr
>>         2080    0    8047008
>>
>>         ]# lsof | wc -l
>>         5157
>>
>>         Sometimes this issue happens from one node to the same node :(
>>
>>         I also think this issue is messing with my regionservers
>>         which are crashing all day long!!
>>
>>         Thanks,
>>         Pablo
>>
>>
>>         On 03/08/2013 06:42 AM, Dhanasekaran Anbalagan wrote:
>>>         Hi Varun
>>>
>>>         I believe is not ulimit issue.
>>>
>>>
>>>         /etc/security/limits.conf
>>>         # End of file
>>>         *               -      nofile      1000000
>>>         *               -      nproc       1000000
>>>
>>>
>>>         please guide me Guys, I want fix this. share your
>>>         thoughts DataXceiver error.
>>>
>>>         Did I learn something today? If not, I wasted it.
>>>
>>>
>>>         On Fri, Mar 8, 2013 at 3:50 AM, varun kumar
>>>         <varun.uid@gmail.com <mailto:varun.uid@gmail.com>> wrote:
>>>
>>>             Hi Dhana,
>>>
>>>             Increase the ulimit for all the datanodes.
>>>
>>>             If you are starting the service using hadoop increase
>>>             the ulimit value for hadoop user.
>>>
>>>             Do the  changes in the following file.
>>>
>>>             */etc/security/limits.conf*
>>>
>>>             Example:-
>>>             *hadoop          soft  nofile          35000*
>>>             *hadoop          hard  nofile          35000*
>>>
>>>             Regards,
>>>             Varun Kumar.P
>>>
>>>             On Fri, Mar 8, 2013 at 1:15 PM, Dhanasekaran Anbalagan
>>>             <bugcy013@gmail.com <mailto:bugcy013@gmail.com>> wrote:
>>>
>>>                 Hi Guys
>>>
>>>                 I am frequently getting is error in my Data nodes.
>>>
>>>                 Please guide what is the exact problem this.
>>>
>>>                 dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK
operation src: /172.16.30.138:50373  <http://172.16.30.138:50373>  dest: /172.16.30.138:50010
 <http://172.16.30.138:50010>
>>>
>>>
>>>
>>>                 java.net.SocketTimeoutException: 70000 millis timeout while waiting
for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.30.138:34280
 <http://172.16.30.138:34280>  remote=/172.16.30.140:50010  <http://172.16.30.140:50010>]
>>>
>>>
>>>
>>>
>>>
>>>                 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>                 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154)
>>>                 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127)
>>>
>>>
>>>
>>>
>>>
>>>                 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:115)
>>>                 at java.io.FilterInputStream.read(FilterInputStream.java:66)
>>>                 at java.io.FilterInputStream.read(FilterInputStream.java:66)
>>>                 at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:160)
>>>
>>>
>>>
>>>
>>>
>>>                 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:405)
>>>                 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>>>                 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
>>>
>>>
>>>
>>>
>>>
>>>                 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189)
>>>                 at java.lang.Thread.run(Thread.java:662)
>>>
>>>                 dvcliftonhera138:50010:DataXceiver error processing WRITE_BLOCK
operation src: /172.16.30.138:50531  <http://172.16.30.138:50531>  dest: /172.16.30.138:50010
 <http://172.16.30.138:50010>
>>>
>>>
>>>
>>>                 java.io.EOFException: while trying to read 65563 bytes
>>>
>>>
>>>                 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:408)
>>>                 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:452)
>>>                 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:511)
>>>
>>>
>>>
>>>
>>>
>>>                 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:748)
>>>                 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:462)
>>>                 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>>>
>>>
>>>
>>>
>>>
>>>                 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
>>>                 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189)
>>>                 at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>>                 How to resolve this.
>>>
>>>                 -Dhanasekaran.
>>>
>>>                 Did I learn something today? If not, I wasted it.
>>>
>>>                 -- 
>>>
>>>
>>>
>>>
>>>
>>>
>>>             -- 
>>>             Regards,
>>>             Varun Kumar.P
>>>
>>>
>>
>>
>
>


Mime
View raw message