Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65DEC10D0C for ; Thu, 25 Jul 2013 10:40:08 +0000 (UTC) Received: (qmail 56732 invoked by uid 500); 25 Jul 2013 10:40:03 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 56294 invoked by uid 500); 25 Jul 2013 10:39:58 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 56283 invoked by uid 99); 25 Jul 2013 10:39:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 10:39:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of write2kishore@gmail.com designates 209.85.214.179 as permitted sender) Received: from [209.85.214.179] (HELO mail-ob0-f179.google.com) (209.85.214.179) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 10:39:51 +0000 Received: by mail-ob0-f179.google.com with SMTP id xk17so1096026obc.38 for ; Thu, 25 Jul 2013 03:39:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=kYpBdaDu30vOYI/uc1IC69SjyMZpos66hp2FwwjPMd0=; b=NCYAqk7cskk6NsShyzDHbCnG2G5wP7fcdcSZCQLN1kHn9vdczeYnA8jsV94Bsty7QU QfJY6P2EKWBDvGWAM6MZ/8lOiH5/MlG2jPV+mYcPbLterL/7U+S7NbSA/0F6zSafYJ/g WT2jF3JGTefPLbo929GrJvUfhslb5TbkZniYF6R62ER8YTpoJyBO9suDZkDO3LpMFAK0 Kx0Jy7oEmnPmi7j4JzKRJFXptK63Vo8Zzv2H7c1hZVnaQJM3PUgWhQZ+KBuHOBBotYnY dMQyngY3fVnpB8naT2SzuONvOioyNeBWVO0CSkuFor0u+QKoTFpSgp1exXwKPW2zudX1 pBnw== MIME-Version: 1.0 X-Received: by 10.42.74.72 with SMTP id v8mr20217249icj.31.1374748770010; Thu, 25 Jul 2013 03:39:30 -0700 (PDT) Received: by 10.42.66.13 with HTTP; Thu, 25 Jul 2013 03:39:29 -0700 (PDT) Date: Thu, 25 Jul 2013 16:09:29 +0530 Message-ID: Subject: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0 From: Krishna Kishore Bonagiri To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=90e6ba3fd60d17970804e253a4ea X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba3fd60d17970804e253a4ea Content-Type: text/plain; charset=ISO-8859-1 Hi, I am running an application against hadoop-2.1.0-beta RC, and my app requires 117 containers, I have got all the containers allocated, but while starting those containers, at around 99th container the node manager has gone down with the following kind of error in it's log. Also, I could reproduce this error running a "sleep 200; date" command using the Distributed Shell example, in which case I got this error at around 66th container. 2013-07-25 06:07:17,743 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202) at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException Thanks, Kishore --90e6ba3fd60d17970804e253a4ea Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

=A0 I am running an application against h= adoop-2.1.0-beta RC, and my app requires 117 containers, I have got all the= containers allocated, but while starting those containers, at around 99th = container the node manager has gone down with the following kind of error i= n it's log. Also, I could reproduce this error running a "sleep 20= 0; date" command using the Distributed Shell example, in which case I = got this error at around 66th container.


2013-07-25 06:07:17,743 FATAL org.a= pache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reape= r,5,main] threw an Error. =A0Shutting down now...
java.lang.OutOf= MemoryError: Failed to create a thread: retVal -1073741830, errno 11
=A0 =A0 =A0 =A0 at java.lang.Thread.startImpl(Native Method)
=A0 =A0 =A0 =A0 at java.lang.Thread.start(Thread.java:887)
=A0 = =A0 =A0 =A0 at java.lang.ProcessInputStream.<init>(UNIXProcess.java:4= 72)
=A0 =A0 =A0 =A0 at java.lang.UNIXProcess$1$1$1.run(UNIXProces= s.java:157)
=A0 =A0 =A0 =A0 at java.security.AccessController.doPrivileged(AccessC= ontroller.java:202)
=A0 =A0 =A0 =A0 at java.lang.UNIXProcess$1$1.= run(UNIXProcess.java:137)
2013-07-25 06:07:17,745 INFO org.apache= .hadoop.util.ExitUtil: Halt with status -1 Message: HaltException

Thanks,
Kishore
--90e6ba3fd60d17970804e253a4ea--