Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 008BC18149 for ; Mon, 20 Jul 2015 14:51:23 +0000 (UTC) Received: (qmail 95611 invoked by uid 500); 20 Jul 2015 14:51:00 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 95509 invoked by uid 500); 20 Jul 2015 14:51:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 95499 invoked by uid 99); 20 Jul 2015 14:50:59 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jul 2015 14:50:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5513B1A74C0 for ; Mon, 20 Jul 2015 14:50:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.15 X-Spam-Level: *** X-Spam-Status: No, score=3.15 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=googlemail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id UmFQryVi3UEO for ; Mon, 20 Jul 2015 14:50:46 +0000 (UTC) Received: from mail-wg0-f66.google.com (mail-wg0-f66.google.com [74.125.82.66]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id E3E3820F46 for ; Mon, 20 Jul 2015 14:50:45 +0000 (UTC) Received: by wgjf7 with SMTP id f7so1936206wgj.0 for ; Mon, 20 Jul 2015 07:50:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=3M5y10rTCKbJ900mxAspG0H0ju3PjyRl6j0LeOFQE0M=; b=Q+jmnn2Y8GzEB3cGuHV25/KKjjkDMklvozH2qzro8sCecixzjOFLq7X2U4pPFqCuoS G6LQCmZcTpNp8zSNQxvptBBart9pr7PWvWu5L2mIlFgVSEMIzH+5HiC4OJubaSn8otBU CAmxusAldeRolqQSmpLq0gIo3OPkr6I0d2kWDS1H4BJkrOkkGx7bDcBC7o3yJCw5D9+v ItV5lr3h5r4WG9NYyDQTueDs6BwYXwraOT/VgxA8s/2mTPYWYe8CMLBV2sLWKLcIjIbY zAYb+VaZMKf3F90dh+2Yw0MCl/XxqD1NWQ+qc4AwSoVphigQRPQObQJDP/X/9QvkwjUk RWvA== X-Received: by 10.194.89.98 with SMTP id bn2mr61220465wjb.153.1437403838514; Mon, 20 Jul 2015 07:50:38 -0700 (PDT) Received: from [10.67.24.95] (WL-POOL2-ONT-088.UNI-MUENSTER.DE. [128.176.164.87]) by smtp.googlemail.com with ESMTPSA id lu5sm32369470wjb.9.2015.07.20.07.50.37 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Jul 2015 07:50:37 -0700 (PDT) Message-ID: <55AD0ABC.8080504@googlemail.com> Date: Mon, 20 Jul 2015 16:50:36 +0200 From: marius User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: sendChunks error References: <55A92936.5040304@googlemail.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------050505090504010903030806" This is a multi-part message in MIME format. --------------050505090504010903030806 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi, i tried to reinstall hadoop on all nodes its now a five node setup (4*slave 1*slave/master). It still gives me the same error on all nodes. But the error is not consistent but comes and goes from time to time. This is the log from one datanode: http://pastebin.com/SQd0G5tF It still is hadoop 2.6.0 with CentOS 7, the hardware varies from node to node. These are my configs: http://pastebin.com/Fmi8bafT Greetings Marius Am 17.07.2015 um 18:15 schrieb Ted Yu: > bq. IOException: Die Verbindung wurde vom Kommunikationspartner > zurückgesetzt > > Looks like the above means 'The connection was reset by the > communication partner' > > Which hadoop release do you use ? > > Can you pastebin more of the datanode log ? > > Thanks > > On Fri, Jul 17, 2015 at 9:11 AM, marius > wrote: > > Hi, > > when i tried to run some Jobs on my hadoop cluster, i found the > following error in my datanode logs: > (the german means connection reseted by peer) > > 2015-07-17 16:33:45,671 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > BlockSender.sendChunks() exception: > java.io.IOException: Die Verbindung wurde vom > Kommunikationspartner zurückgesetzt > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > at > sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443) > at > sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575) > at org.apache.hadoop.net > .SocketOutputStream.transferToFully(SocketOutputStream.java:223) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:559) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:728) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:496) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) > > i already googled this but i could not find anything... > This appears several times and then the error vanishes and the > jobs proceeds normally, and the job does not fail. This happens on > various nodes. I already formated my namenode but that did not fix it. > > Thanks and greetings > > Marius > > --------------050505090504010903030806 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit Hi,

i tried to reinstall hadoop on all nodes its now a five node setup (4*slave 1*slave/master). It still gives me the same error on all nodes. But the error is not consistent but comes and goes from time to time. This is the log from one datanode:
http://pastebin.com/SQd0G5tF

It still is hadoop 2.6.0 with CentOS 7, the hardware varies from node to node.

These are my configs:
http://pastebin.com/Fmi8bafT

Greetings Marius



Am 17.07.2015 um 18:15 schrieb Ted Yu:
bq. IOException: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt

Looks like the above means 'The connection was reset by the communication partner'

Which hadoop release do you use ?

Can you pastebin more of the datanode log ?

Thanks

On Fri, Jul 17, 2015 at 9:11 AM, marius <m.die0123@googlemail.com> wrote:
Hi,

when i tried to run some Jobs on my hadoop cluster, i found the following error in my  datanode logs:
(the german means connection reseted by peer)

2015-07-17 16:33:45,671 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() exception:
java.io.IOException: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
        at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443)
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:559)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:728)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:496)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
        at java.lang.Thread.run(Thread.java:745)

i already googled this but i could not find anything...
This appears several times and then the error vanishes and the jobs proceeds normally, and the job does not fail. This happens on various nodes. I already formated my namenode but that did not fix it.

Thanks and greetings

Marius


--------------050505090504010903030806--