hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C G <parallel...@yahoo.com>
Subject Re: dfs.DataNode connection issues
Date Wed, 16 Jul 2008 21:54:04 GMT
You should look at https://issues.apache.org/jira/browse/HADOOP-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610003#action_12610003 as
well.  This eliminates spurious "connection reset by peer" messages that clutter up the DataNode
logs and can be confusing.

--- On Wed, 7/16/08, brainstorm <braincode@gmail.com> wrote:

From: brainstorm <braincode@gmail.com>
Subject: Re: dfs.DataNode connection issues
To: core-user@hadoop.apache.org
Date: Wednesday, July 16, 2008, 10:25 AM

Raghu, seems to be resolved by your patch:

http://issues.apache.org/jira/browse/HADOOP-3007

Do you know of any other "complaints" on this issue (conn reset &
related errors) after applying this patch ?

Thanks.

On Wed, Jul 16, 2008 at 4:04 PM, brainstorm <braincode@gmail.com> wrote:
> Just for the record, as I have seen on previous archives regarding
> this same problem, I've changed the (cheap) 10/100 switch with a
> (robust?) 100/1000 one and a couple of ethernet cables... and nope, in
> my case it's not hardware related (at least on switch/cable end).
>
> Any other hints ?
>
> Thanks in advance !
>
> On Wed, Jul 16, 2008 at 3:12 PM, brainstorm <braincode@gmail.com>
wrote:
>> If you refer to the other nodes:
>>
>> 2008-07-16 14:41:00,124 ERROR dfs.DataNode -
>> 192.168.0.252:50010:DataXceiver: java.io.IOException: Block
>> blk_7443738244200783289 has already been started (though not
>> completed), and thus cannot be created.
>>        at
org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:638)
>>        at
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983)
>>        at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
>>        at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
>>        at java.lang.Thread.run(Thread.java:595)
>>
>> 2008-07-16 14:41:00,309 ERROR dfs.DataNode -
>> 192.168.0.252:50010:DataXceiver: java.io.IOException: Block
>> blk_7443738244200783289 is valid, and cannot be written to.
>>        at
org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:608)
>>        at
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983)
>>        at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
>>        at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
>>        at java.lang.Thread.run(Thread.java:595)
>>
>> and:
>>
>> 2008-07-16 14:41:00,178 WARN  dfs.DataNode -
>> 192.168.0.253:50010:Failed to transfer blk_7443738244200783289 to
>> 192.168.0.252:50010 got java.net.SocketException: Connection reset
>>        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>>        at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>        at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>        at
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602)
>>        at
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636)
>>        at
org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391)
>>        at java.lang.Thread.run(Thread.java:595)
>>
>> (Seem inter-node DFS communication errors also :-/)
>>
>> On Tue, Jul 15, 2008 at 11:19 PM, Raghu Angadi
<rangadi@yahoo-inc.com> wrote:
>>>
>>> Are there any errors reported on the other side of the socket (for
the first
>>> error below, its the datanode on 192.168.0.251)?.
>>>
>>> Raghu.
>>>
>>> brainstorm wrote:
>>>>
>>>> I'm getting the following WARNINGs that seem to slow down
my nutch
>>>> processes on a 3 node and 1 frontend cluster:
>>>>
>>>> 2008-07-15 18:53:19,048 WARN  dfs.DataNode -
>>>> 192.168.0.100:50010:Failed to transfer
blk_-8676066332392254756 to
>>>> 192.168.0.251:50010 got java.net.SocketException: Connection
reset
>>>>        at
>>>>
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>>>>        at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>>>        at
>>>>
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>>>        at
>>>>
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>>>        at
java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>>        at
>>>>
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602)
>>>>        at
>>>>
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636)
>>>>        at
>>>>
org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391)
>>>>        at java.lang.Thread.run(Thread.java:595)
>>>>
>>>> 2008-07-15 18:53:52,162 WARN  dfs.DataNode -
>>>> 192.168.0.100:50010:Failed to transfer blk_5699662911845813103
to
>>>> 192.168.0.253:50010 got java.net.SocketException: Broken pipe
>>>>        at java.net.SocketOutputStream.socketWrite0(Native
Method)
>>>>        at
>>>>
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>>>        at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>>>        at
>>>>
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>>>        at
>>>>
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>>>        at
java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>>        at
>>>>
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602)
>>>>        at
>>>>
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636)
>>>>        at
>>>>
org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391)
>>>>        at java.lang.Thread.run(Thread.java:595)
>>>>
>>>> I've looked for firewalling issues but right now the test
setup is:
>>>>
>>>> 3 nodes with "iptables -F" (default ACCEPT policy
for INPUT & OUTPUT
>>>> (aka: no firewall)).
>>>>
>>>> Frontend console (192.168.0.100) has ACCEPT for NODE to NODE
& frontend.
>>>>
>>>> I've been debugging with wireshark, but all I see is RST
packets sent
>>>> from frontend to nodes, no corrupted frames... When
there's no reset,
>>>> I just see .jar contents flying by (RMI?)... What am I missing
here ?
>>>> :-S
>>>
>>>
>>
>


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message