Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6B7610787 for ; Fri, 14 Feb 2014 14:18:32 +0000 (UTC) Received: (qmail 62127 invoked by uid 500); 14 Feb 2014 14:18:20 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 61696 invoked by uid 500); 14 Feb 2014 14:18:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 61686 invoked by uid 99); 14 Feb 2014 14:18:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Feb 2014 14:18:17 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [211.189.100.12] (HELO usmailout2.samsung.com) (211.189.100.12) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Feb 2014 14:18:09 +0000 Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by mailout2.w2.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0N0Z00F0JP1L5M50@mailout2.w2.samsung.com> for user@hadoop.apache.org; Fri, 14 Feb 2014 09:17:45 -0500 (EST) X-AuditID: cbfec373-b7f4a6d000005e0a-75-52fe2589ad02 Received: from ussync1.samsung.com ( [203.254.195.81]) by uscpsbgm2.samsung.com (USCPMTA) with SMTP id 1E.B1.24074.9852EF25; Fri, 14 Feb 2014 09:17:45 -0500 (EST) Received: from lgflarrahondo ([105.140.33.168]) by ussync1.samsung.com (Oracle Communications Messaging Server 7u4-23.01 (7.0.4.23.0) 64bit (built Aug 10 2011)) with ESMTPA id <0N0Z00LT2P1L7A40@ussync1.samsung.com> for user@hadoop.apache.org; Fri, 14 Feb 2014 09:17:45 -0500 (EST) From: German Florez-Larrahondo To: user@hadoop.apache.org References: <9B10B7A8-F9D9-491B-9972-835777DCD22E@gmail.com> In-reply-to: <9B10B7A8-F9D9-491B-9972-835777DCD22E@gmail.com> Subject: RE: How to ascertain why LinuxContainer dies? Date: Fri, 14 Feb 2014 08:17:44 -0600 Message-id: <011501cf298f$8843ca80$98cb5f80$@samsung.com> MIME-version: 1.0 Content-type: multipart/alternative; boundary="----=_NextPart_000_0116_01CF295D.3DAA44E0" X-Mailer: Microsoft Outlook 14.0 Thread-index: AQJqstDJqC4k6VMGKXOVuvjVRZ4YwwLrGVrpATBGcceZXN3BUA== Content-language: en-us X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrHLMWRmVeSWpSXmKPExsVy+t/hQN1O1X9BBo/Pmlr0TJnG4sDoMaFr C2MAYxSXTUpqTmZZapG+XQJXxpTfU9gLPsZXrGr/zNjA2BHaxcjJISFgInH07V9WCFtM4sK9 9WxdjFwcQgJLGCWmPH4P5Sxgktj15BQ7SBWbgJnE744GZhBbREBKovvNZCaIojOMErs+zmcC SXAK2Ercv/kLbKwwUMO+UzfBmlkEVCUevOsCa+YVsJR4eaedEcIWlPgx+R4LiM0sEC2xqvUc I8RJChI7zr5mhFjmJHHl6B82iBpxiUkPHrJPYBSYhaR9FpL2WUjKIGxtid6HrYww9rKFr5kh bC2Js1evsyOLL2BkX8UoWlqcXFCclJ5rpFecmFtcmpeul5yfu4kREubFOxhfbLA6xCjAwajE w/vh7p8gIdbEsuLK3EOMEhzMSiK8pyT+BQnxpiRWVqUW5ccXleakFh9iZOLglGpgnMnJv+zz 794g6SXVG5InHMk8fO0k75MWHg9u4VXlX40OJn6NW8jtFO95QvPwjx6nVtVly+eX3qlX2tG2 ryjeeE3juis317LyNvv7xax76MX+LfrD3iXXGy4Irudxl5rGPWfT9aPSgdF2777u1n9SZc8q rMOu/CwibVa34O/Xr2U/T864NMP8vRJLcUaioRZzUXEiAAGk20xRAgAA X-Virus-Checked: Checked by ClamAV on apache.org This is a multipart message in MIME format. ------=_NextPart_000_0116_01CF295D.3DAA44E0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I believe that errors on containers are not propagated to the standard = =E2=80=9CJava=E2=80=9D logs. You have to look into the std* and syslog files of the container: =20 Here is an example : =20 .../userlogs/application_1391549207212_0006/container_1391549207212_0006_= 01_000027 =20 [htf@gfldesktop container_1391549207212_0006_01_000027]$ ls -lart total 60 -rw-rw-r-- 1 htf htf 0 Feb 4 17:27 stdout -rw-rw-r-- 1 htf htf 0 Feb 4 17:27 stderr drwx--x--- 28 htf htf 4096 Feb 4 17:27 .. drwx--x--- 2 htf htf 4096 Feb 4 17:27 . -rw-rw-r-- 1 htf htf 50471 Feb 4 17:31 syslog =20 Regards ./g =20 -----Original Message----- From: Jay Vyas [mailto:jayunit100@gmail.com]=20 Sent: Friday, February 14, 2014 7:02 AM To: user@hadoop.apache.org Cc: Subject: Re: How to ascertain why LinuxContainer dies? =20 Not sure where the containers dump standard out /error to? I figured it = would be propagated in the node manager logs if anywhere, right? =20 Sent from my iPhone =20 > On Feb 14, 2014, at 4:46 AM, Harsh J < = harsh@cloudera.com> wrote: >=20 > Hi, >=20 > Does your container command generate any stderr/stdout outputs that=20 > you can check under the container's work directory after it fails? >=20 >> On Fri, Feb 14, 2014 at 9:46 AM, Jay Vyas < = jayunit100@gmail.com> wrote: >> I have a linux container that dies. The nodemanager logs only say: >>=20 >> WARN = org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: >> Exception from container-launch : >> org.apache.hadoop.util.Shell$ExitCodeException: >> =E2=80=82=E2=80=82at = org.apache.hadoop.util.Shell.runCommand(Shell.java:202) >> =E2=80=82=E2=80=82at org.apache.hadoop.util.Shell.run(Shell.java:129) >> =E2=80=82=E2=80=82at >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: >> 322) >> =E2=80=82=E2=80=82at >> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.laun >> chContainer(LinuxContainerExecutor.java:230) >> =E2=80=82=E2=80=82at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C >> ontainerLaunch.call(ContainerLaunch.java:242) >> =E2=80=82=E2=80=82at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C >> ontainerLaunch.call(ContainerLaunch.java:68) >> =E2=80=82=E2=80=82at=20 >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> =E2=80=82=E2=80=82at = java.util.concurrent.FutureTask.run(FutureTask.java:138) >> =E2=80=82=E2=80=82at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec >> utor.java:886) >> =E2=80=82=E2=80=82at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor >> .java:908) >> =E2=80=82=E2=80=82at java.lang.Thread.run(Thread.java:662) >>=20 >> where can i find the root cause of the non-zero exit code ? >>=20 >> -- >> Jay Vyas >> http://jayunit100.blogspot.com >=20 >=20 >=20 > -- > Harsh J ------=_NextPart_000_0116_01CF295D.3DAA44E0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I = believe that errors on containers are not propagated to the standard = =E2=80=9CJava=E2=80=9D logs.

You = have to look into the std* and syslog files of the = container:

 

Here is an example :

 

.../userlogs/application_1391549207212_0006/conta= iner_1391549207212_0006_01_000027

 

[htf@gfldesktop = container_1391549207212_0006_01_000027]$ ls -lart

total 60

-rw-rw-r--=C2=A0 1 htf htf=C2=A0=C2=A0=C2=A0=C2=A0 = 0 Feb=C2=A0 4 17:27 stdout

-rw-rw-r--=C2=A0 1 htf htf=C2=A0=C2=A0=C2=A0=C2=A0 = 0 Feb=C2=A0 4 17:27 stderr

drwx--x--- 28 htf htf=C2=A0 4096 Feb=C2=A0 4 17:27 = ..

drwx--x---=C2=A0 2 htf = htf=C2=A0 4096 Feb=C2=A0 4 17:27 .

-rw-rw-r--=C2=A0 1 htf htf 50471 Feb=C2=A0 4 17:31 = syslog

 

Regards

./g

 

-----Original Message-----
From: Jay Vyas = [mailto:jayunit100@gmail.com]
Sent: Friday, February 14, 2014 7:02 = AM
To: user@hadoop.apache.org
Cc: = <user@hadoop.apache.org>
Subject: Re: How to ascertain why = LinuxContainer dies?

 

Not sure where the containers dump standard out = /error to?=C2=A0 I figured it would be propagated in the node manager = logs if anywhere, right?

 

Sent = from my iPhone

 

> = On Feb 14, 2014, at 4:46 AM, Harsh J <harsh@cloudera.com= > wrote:

> =

> Hi,

>

> = Does your container command generate any stderr/stdout outputs that =

> you can check under the = container's work directory after it fails?

>

>> On Fri, Feb 14, 2014 at 9:46 AM, Jay Vyas = <jayunit100@gmail.com> wrote:

>> I have = a linux container that dies.=C2=A0 The nodemanager logs only = say:

>>

>> WARN = org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:

>> Exception from container-launch = :

>> = org.apache.hadoop.util.Shell$ExitCodeException:

>> =E2=80=82=E2=80=82at = org.apache.hadoop.util.Shell.runCommand(Shell.java:202)

>> =E2=80=82=E2=80=82at = org.apache.hadoop.util.Shell.run(Shell.java:129)

>> =E2=80=82=E2=80=82at

>> = org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:

>> 322)

>> =E2=80=82=E2=80=82at

>> = org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.laun

>> = chContainer(LinuxContainerExecutor.java:230)

>> =E2=80=82=E2=80=82at

>> = org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C

>> = ontainerLaunch.call(ContainerLaunch.java:242)

>> =E2=80=82=E2=80=82at

>> = org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C

>> = ontainerLaunch.call(ContainerLaunch.java:68)

>> =E2=80=82=E2=80=82at

>> = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

>> =E2=80=82=E2=80=82at = java.util.concurrent.FutureTask.run(FutureTask.java:138)

>> =E2=80=82=E2=80=82at

>> = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec

>> = utor.java:886)

>> = =E2=80=82=E2=80=82at

>> = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor

>> .java:908)

>> =E2=80=82=E2=80=82at = java.lang.Thread.run(Thread.java:662)

>>

>> where can i find the root cause of the = non-zero exit code ?

>> =

>> --

>> Jay Vyas

>> http://jayunit100.blogspo= t.com

> =

>

>

> = --

> Harsh = J

------=_NextPart_000_0116_01CF295D.3DAA44E0--