Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of devaraj.k@huawei.com designates
 119.145.14.65 as permitted sender)
From: Devaraj k <devaraj.k@huawei.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: RE: Node manager crashing when running an app requiring 100
 containers on hadoop-2.1.0-beta RC0
Thread-Topic: Node manager crashing when running an app requiring 100
 containers on hadoop-2.1.0-beta RC0
Thread-Index: AQHOiSNV3sfkGVDlh0623fiAQsMj4Jl1NsKQ
Date: Thu, 25 Jul 2013 10:54:16 +0000
Message-ID: 
 <06006DDA5A27D541991944AC4117E7A96E1E1722@szxeml560-mbx.china.huawei.com>
References: 
 <CAHg+sbNnFCEvVKP_MmBD_84TMwhGUarZHFeUPt-S3eHRxGZ5RA@mail.gmail.com>
In-Reply-To: 
 <CAHg+sbNnFCEvVKP_MmBD_84TMwhGUarZHFeUPt-S3eHRxGZ5RA@mail.gmail.com>
Accept-Language: en-US, zh-CN
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_06006DDA5A27D541991944AC4117E7A96E1E1722szxeml560mbxchi_"
MIME-Version: 1.0

--_000_06006DDA5A27D541991944AC4117E7A96E1E1722szxeml560mbxchi_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi Kishore,

It seems that system doesn't have enough resources to launch a new thread.

Could you check the system is affordable to launch the configured container=
s and try increasing the native memory available in the system by reducing =
the no of running processes in the system.

Thanks
Devaraj k

From: Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
Sent: 25 July 2013 16:09
To: user@hadoop.apache.org
Subject: Node manager crashing when running an app requiring 100 containers=
 on hadoop-2.1.0-beta RC0

Hi,

  I am running an application against hadoop-2.1.0-beta RC, and my app requ=
ires 117 containers, I have got all the containers allocated, but while sta=
rting those containers, at around 99th container the node manager has gone =
down with the following kind of error in it's log. Also, I could reproduce =
this error running a "sleep 200; date" command using the Distributed Shell =
example, in which case I got this error at around 66th container.


2013-07-25 06:07:17,743 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionH=
andler: Thread Thread[process reaper,5,main] threw an Error.  Shutting down=
 now...
java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, =
errno 11
        at java.lang.Thread.startImpl(Native Method)
        at java.lang.Thread.start(Thread.java:887)
        at java.lang.ProcessInputStream.<init>(UNIXProcess.java:472)
        at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157)
        at java.security.AccessController.doPrivileged(AccessController.jav=
a:202)
        at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137)
2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with sta=
tus -1 Message: HaltException

Thanks,
Kishore

--_000_06006DDA5A27D541991944AC4117E7A96E1E1722szxeml560mbxchi_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 12 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:SimSun;
	panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
	{font-family:SimSun;
	panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div class=3D"WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Hi Kishore,<o:p></o:p></s=
pan></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal" style=3D"text-indent:36.0pt"><span style=3D"font-siz=
e:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F49=
7D">It seems that system doesn&#8217;t have enough resources to launch a ne=
w thread.
<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Could you check the syste=
m is affordable to launch the configured containers and try increasing the =
native memory available in the system by reducing the no
 of running processes in the system.<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#365F91">Thanks<o:p></o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#365F91">Devaraj k<o:p></o:p></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<div style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm =
0cm 0cm">
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Krishna =
Kishore Bonagiri [mailto:write2kishore@gmail.com]
<br>
<b>Sent:</b> 25 July 2013 16:09<br>
<b>To:</b> user@hadoop.apache.org<br>
<b>Subject:</b> Node manager crashing when running an app requiring 100 con=
tainers on hadoop-2.1.0-beta RC0<o:p></o:p></span></p>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">Hi,<o:p></o:p></p>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">&nbsp; I am running an application against hadoop-2.=
1.0-beta RC, and my app requires 117 containers, I have got all the contain=
ers allocated, but while starting those containers, at around 99th containe=
r the node manager has gone down with the
 following kind of error in it's log. Also, I could reproduce this error ru=
nning a &quot;sleep 200; date&quot; command using the Distributed Shell exa=
mple, in which case I got this error at around 66th container.<o:p></o:p></=
p>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<div>
<p class=3D"MsoNormal">2013-07-25 06:07:17,743 FATAL org.apache.hadoop.yarn=
.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw a=
n Error. &nbsp;Shutting down now...<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">java.lang.OutOfMemoryError: Failed to create a threa=
d: retVal -1073741830, errno 11<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.Thread.star=
tImpl(Native Method)<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.Thread.star=
t(Thread.java:887)<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.ProcessInpu=
tStream.&lt;init&gt;(UNIXProcess.java:472)<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.UNIXProcess=
$1$1$1.run(UNIXProcess.java:157)<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; at java.security.AccessC=
ontroller.doPrivileged(AccessController.java:202)<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.UNIXProcess=
$1$1.run(UNIXProcess.java:137)<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.=
ExitUtil: Halt with status -1 Message: HaltException<o:p></o:p></p>
</div>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">Thanks,<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">Kishore<o:p></o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>

--_000_06006DDA5A27D541991944AC4117E7A96E1E1722szxeml560mbxchi_--