Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 19ACCC73F for ; Thu, 11 Jul 2013 08:27:31 +0000 (UTC) Received: (qmail 14256 invoked by uid 500); 11 Jul 2013 08:27:26 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 13677 invoked by uid 500); 11 Jul 2013 08:27:17 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 13670 invoked by uid 99); 11 Jul 2013 08:27:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jul 2013 08:27:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of faithlessfriend@gmail.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jul 2013 08:27:11 +0000 Received: by mail-lb0-f176.google.com with SMTP id z5so6347600lbh.21 for ; Thu, 11 Jul 2013 01:26:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=pF1HMHLZp5M/Ndp8uQAUeFaixq82Z31GjkGWkBwCrV8=; b=IR50VF6bOGl07HAFLU+FqCi+eQxvXdykUmV4egCZFeYdB7vr8IoIVYtQQsvBuLboa6 UOeDK8tDie5d40DwKqNKEECNjdSwPtXjPYeBfvw1rhtOHEARe6FHv1pI1xuFmTJrLWHR ocRG2b3FZ+TNC4NgCL//j9LjA/ynOyKwBtMM8gJcquNKUBRJ24jBOPNEGOd4Yk6XLewV xS6OJtySIg1up90pgNftxueiKMYSDVHC2D5OLqz/lC5JOozx1f3OUQSn7gbT7uWwDRSp 7LKdB94M/SbgdAA/bSzH8zmSPIwAJVX3eoOnlcYS1AhrwR7GgNJK/VfwpmFU4q+Cjaf/ t4cg== MIME-Version: 1.0 X-Received: by 10.152.5.6 with SMTP id o6mr16884218lao.48.1373531210235; Thu, 11 Jul 2013 01:26:50 -0700 (PDT) Received: by 10.114.161.7 with HTTP; Thu, 11 Jul 2013 01:26:50 -0700 (PDT) In-Reply-To: References: <06006DDA5A27D541991944AC4117E7A96E1D389B@szxeml560-mbx.china.huawei.com> Date: Thu, 11 Jul 2013 11:26:50 +0300 Message-ID: Subject: Re: ConnectionException in container, happens only sometimes From: Andrei To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01419d1adfc4c204e1382769 X-Virus-Checked: Checked by ClamAV on apache.org --089e01419d1adfc4c204e1382769 Content-Type: text/plain; charset=ISO-8859-1 Here are logs of RM and 2 NMs: RM (master-host): http://pastebin.com/q4qJP8Ld NM where AM ran (slave-1-host): http://pastebin.com/vSsz7mjG NM where slave container ran (slave-2-host): http://pastebin.com/NMFi6gRp The only related error I've found in them is the following (from RM logs): ... 2013-07-11 07:46:06,225 ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1373465780870_0005_000001 2013-07-11 07:46:06,227 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.yarn.api.AMRMProtocolPB.allocate from 10.128.40.184:47101: output error 2013-07-11 07:46:06,228 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8030 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:456) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2140) at org.apache.hadoop.ipc.Server.access$2000(Server.java:108) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:939) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1005) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1747) 2013-07-11 07:46:11,238 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved my_user to /default-rack 2013-07-11 07:46:11,283 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node my_user(cmPort: 59267 httpPort: 8042) registered with capability: 8192, assigned nodeId my_user:59267 ... Though from stack trace it's hard to tell where this error came from. Let me know if you need any more information. On Thu, Jul 11, 2013 at 1:00 AM, Andrei wrote: > Hi Omkar, > > I'm out of office now, so I'll post it as fast as get back there. > > Thanks > > > On Thu, Jul 11, 2013 at 12:39 AM, Omkar Joshi wrote: > >> can you post RM/NM logs too.? >> >> Thanks, >> Omkar Joshi >> *Hortonworks Inc.* >> >> --089e01419d1adfc4c204e1382769 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Here are logs of RM and 2 NMs:=A0

RM (m= aster-host):=A0http://pastebin.com= /q4qJP8Ld
NM where AM ran (slave-1-host):=A0http://pastebin.com/vSsz7mjG
NM where slave container ran (slave-2-host):=A0http://pastebin.com/NMFi6gRp

The only related error I've found in them is the following (from RM lo= gs):=A0

...
2013-07-11 07:46:06,225 ERROR org.ap= ache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemp= tId doesnt exist in cache appattempt_1373465780870_0005_000001
2013-07-11 07:46:06,227 WARN org.apache.hadoop.ipc.Server: IPC Server Respo= nder, call org.apache.hadoop.yarn.api.AMRMProtocolPB.allocate from 10.128.40.184:47101: output error
2013-07-11 07:46:06,228 INFO org.apache.hadoop.ipc.Server: IPC Server = handler 0 on 8030 caught an exception
java.nio.channels.ClosedCha= nnelException
= at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265)=
at sun.nio.ch.Socke= tChannelImpl.write(SocketChannelImpl.java:456)
at org.apache.hadoop.ipc.Server.channelWri= te(Server.java:2140)
at org.apache.hadoo= p.ipc.Server.access$2000(Server.java:108)
at org.apache.hadoop.ipc.Server$Responder.proc= essResponse(Server.java:939)
at org.apache.hadoo= p.ipc.Server$Responder.doRespond(Server.java:1005)
at org.apache.hadoop.ipc.Server$Handle= r.run(Server.java:1747)
2013-07-11 07:46:11,238 INFO org.apache.hadoop.yarn.util.RackResolver:= Resolved my_user to /default-rack
2013-07-11 07:46:11,283 INFO o= rg.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeMa= nager from node my_user(cmPort: 59267 httpPort: 8042) registered with capab= ility: 8192, assigned nodeId my_user:59267
...

Though from stack trace it's ha= rd to tell where this error came from.=A0

Let me k= now if you need any more information.=A0







=



On Thu, Jul 11, 2013 at 1:00 AM, Andrei <faithlessfriend@gmail.c= om> wrote:
Hi Omkar,=A0

=
I'm out of office now, so I'll post it as fast as get back the= re.=A0

Thanks


On Thu, Jul 11, 2013 at 12:39 AM, Omkar Joshi <ojoshi@hortonworks.com= > wrote:
can you post RM/NM logs too.?
Thanks,
Omkar Joshi


--089e01419d1adfc4c204e1382769--