Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of rohithsharmaks@huawei.com
 designates 119.145.14.64 as permitted sender)
From: Rohith Sharma K S <rohithsharmaks@huawei.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: RE: Time out after 600 for YARN mapreduce application
Thread-Topic: Time out after 600 for YARN mapreduce application
Thread-Index: AdBF4fIUKkNIeXRRQI2klnJeeMPP+wAAYsaA
Date: Wed, 11 Feb 2015 10:31:57 +0000
Message-ID: 
 <0EE80F6F7A98A64EBD18F2BE839C9115677377C6@szxeml512-mbs.china.huawei.com>
References: <955310621550324C9615786FF4B225B10182674098@ysiazmail01>
In-Reply-To: <955310621550324C9615786FF4B225B10182674098@ysiazmail01>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_0EE80F6F7A98A64EBD18F2BE839C9115677377C6szxeml512mbschi_"
MIME-Version: 1.0

--_000_0EE80F6F7A98A64EBD18F2BE839C9115677377C6szxeml512mbschi_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Looking into attemptID, this is mapper task getting timed out in MapReduce =
job.  The configuration that can be used to increase the value is 'mapreduc=
e.task.timeout'.

The task timed out is because if there is no heartbeat from MapperTask(Yarn=
Child) to MRAppMaster for 10 mins.  Does MR job is custom job?  If so any o=
peration are you doing in cleanup() of Mapper ? Sometimes there would be po=
ssible that if cleanup() of Mapper is taking more time greater than timedou=
t configured that result in task to timeout.


Thanks & Regards
Rohith Sharma K S
From: Alexandru Pacurar [mailto:Alexandru.Pacurar@PropertyShark.com]
Sent: 11 February 2015 15:34
To: user@hadoop.apache.org
Subject: Time out after 600 for YARN mapreduce application

Hello,

I keep encountering an error when running nutch on hadoop YARN:

AttemptID:attempt_1423062241884_9970_m_000009_0 Timed out after 600 secs

Some info on my setup. I'm running a 64 nodes cluster with hadoop 2.4.1. Ea=
ch node has 4 cores, 1 disk and 24Gb of RAM, and the namenode/resourcemanag=
er has the same specs only with 8 cores.

I am pretty sure one of these parameters is to the threshold I'm hitting:

yarn.am.liveness-monitor.expiry-interval-ms
yarn.nm.liveness-monitor.expiry-interval-ms
yarn.resourcemanager.nm.liveness-monitor.interval-ms

but I would like to understand why.

The issue usually appears under heavier load, and most of the time the on t=
he next attempts it is successful. Also if I restart the Hadoop cluster the=
 error goes away for some time.

Thanks,
Alex

--_000_0EE80F6F7A98A64EBD18F2BE839C9115677377C6szxeml512mbschi_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 12 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
	{font-family:Consolas;
	panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:#0563C1;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:#954F72;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal;
	font-family:"Calibri","sans-serif";
	color:windowtext;}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72">
<div class=3D"WordSection1">
<p class=3D"MsoNormal" style=3D"text-indent:36.0pt"><span style=3D"color:#1=
F497D">Looking into attemptID, this is mapper task getting timed out in Map=
Reduce job. &nbsp;The configuration that can be used to increase the value =
is &#8216;mapreduce.task.timeout&#8217;.<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1F497D"><o:p>&nbsp;</o:p></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"color:#1F497D">The task timed out is =
because if there is no heartbeat from MapperTask(YarnChild) to MRAppMaster =
for 10 mins. &nbsp;Does MR job is custom job? &nbsp;If so any operation are=
 you doing in
</span><span style=3D"font-size:10.0pt;font-family:Consolas;color:black;bac=
kground:silver;mso-highlight:silver">cleanup</span><span style=3D"font-size=
:10.0pt;font-family:Consolas;color:black">() of Mapper ? Sometimes there wo=
uld be possible that if cleanup() of
 Mapper is taking more time greater than timedout configured that result in=
 task to timeout.
</span><span style=3D"color:#1F497D"><o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1F497D"><o:p>&nbsp;</o:p></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"color:#1F497D"><o:p>&nbsp;</o:p></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"color:#1F497D">Thanks &amp; Regards<o=
:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1F497D">Rohith Sharma K S<o:p>=
</o:p></span></p>
<div>
<div style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm =
0cm 0cm">
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Alexandr=
u Pacurar [mailto:Alexandru.Pacurar@PropertyShark.com]
<br>
<b>Sent:</b> 11 February 2015 15:34<br>
<b>To:</b> user@hadoop.apache.org<br>
<b>Subject:</b> Time out after 600 for YARN mapreduce application<o:p></o:p=
></span></p>
</div>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Hello,<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">I keep encountering an error when running nutch on h=
adoop YARN:<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">AttemptID:attempt_1423062241884_9970_m_000009_0 Time=
d out after 600 secs<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Some info on my setup. I'm running a 64 nodes cluste=
r with hadoop 2.4.1. Each node has 4 cores, 1 disk and 24Gb of RAM, and the=
 namenode/resourcemanager has the same specs only with 8 cores.<o:p></o:p><=
/p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">I am pretty sure one of these parameters is to the t=
hreshold I'm hitting:<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">yarn.am.liveness-monitor.expiry-interval-ms<o:p></o:=
p></p>
<p class=3D"MsoNormal">yarn.nm.liveness-monitor.expiry-interval-ms<o:p></o:=
p></p>
<p class=3D"MsoNormal">yarn.resourcemanager.nm.liveness-monitor.interval-ms=
<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">but I would like to understand why.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">The issue usually appears under heavier load, and mo=
st of the time the on the next attempts it is successful. Also if I restart=
 the Hadoop cluster the error goes away for some time.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Thanks,<o:p></o:p></p>
<p class=3D"MsoNormal">Alex<o:p></o:p></p>
</div>
</body>
</html>

--_000_0EE80F6F7A98A64EBD18F2BE839C9115677377C6szxeml512mbschi_--