Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of xgong@hortonworks.com
 designates 64.78.52.187 as permitted sender)
Subject: Re: Max Connect retries
MIME-Version: 1.0
From: Xuan Gong <xgong@hortonworks.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Thread-Topic: Max Connect retries
Thread-Index: AQHQQ4Bvc9flKFLENUmZRPqCBTHgbJznvp+A
Date: Mon, 9 Feb 2015 04:42:32 +0000
Message-ID: <D0FD7A26.3458D%xgong@hortonworks.com>
References: 
 <CADbqdAx1hacJWvDan-tJkAL=YvqDqdyjdC1Duj7SMUGBriuP1A@mail.gmail.com>
In-Reply-To: 
 <CADbqdAx1hacJWvDan-tJkAL=YvqDqdyjdC1Duj7SMUGBriuP1A@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_D0FD7A263458Dxgonghortonworkscom_"

--_000_D0FD7A263458Dxgonghortonworkscom_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

That is for client connect retry in ipc level.

You can decrease the max.retries by configuring

ipc.client.connect.max.retries.on.timeouts

in core-site.xml


Thanks

Xuan Gong

From: Telles Nobrega <tellesnobrega@gmail.com<mailto:tellesnobrega@gmail.co=
m>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@had=
oop.apache.org<mailto:user@hadoop.apache.org>>
Date: Saturday, February 7, 2015 at 8:37 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.ap=
ache.org<mailto:user@hadoop.apache.org>>
Subject: Max Connect retries

Hi, I changed my cluster config so a failed nodemanager can be detected in =
about 30 seconds. When I'm running a wordcount the reduce gets stuck in 25%=
 for a quite while and logs show nodes trying to connect to the failed node=
:


org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844=
fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911<http://10.3.2.99:49911>. =
Already tried 28 time(s); maxRetries=3D45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.ha=
doop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attem=
pt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000

Is this the expected behaviour? should I change max retries to a lower valu=
es? if so, which  config is that?

Thanks


--_000_D0FD7A263458Dxgonghortonworkscom_
Content-Type: text/html; charset="us-ascii"
Content-ID: <65F515FCA05A6D44B70743C8469D320C@exch080.serverpod.net>
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
</head>
<body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-lin=
e-break: after-white-space; color: rgb(0, 0, 0); font-family: Calibri, sans=
-serif;">
<div>
<p style=3D"margin: 0px; font-family: Times;">That is for client connect re=
try in ipc level.&nbsp;</p>
<p style=3D"margin: 0px; font-family: Times;">You can decrease the max.retr=
ies by configuring&nbsp;</p>
<p style=3D"margin: 0px; font-family: Times;">ipc.client.connect.max.retrie=
s.on.timeouts</p>
<p style=3D"margin: 0px; font-family: Calibri;"></p>
<p style=3D"margin: 0px; font-family: Times;">in core-site.xml</p>
</div>
<div style=3D"font-size: 14px;"><br>
</div>
<div style=3D"font-size: 14px;"><br>
</div>
<div style=3D"font-size: 14px;">Thanks</div>
<div style=3D"font-size: 14px;"><br>
</div>
<div style=3D"font-size: 14px;">Xuan Gong</div>
<div style=3D"font-size: 14px;"><br>
</div>
<span id=3D"OLK_SRC_BODY_SECTION" style=3D"font-size: 14px;">
<div style=3D"font-family:Calibri; font-size:11pt; text-align:left; color:b=
lack; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM:=
 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;=
 BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span style=3D"font-weight:bold">From: </span>Telles Nobrega &lt;<a href=3D=
"mailto:tellesnobrega@gmail.com">tellesnobrega@gmail.com</a>&gt;<br>
<span style=3D"font-weight:bold">Reply-To: </span>&quot;<a href=3D"mailto:u=
ser@hadoop.apache.org">user@hadoop.apache.org</a>&quot; &lt;<a href=3D"mail=
to:user@hadoop.apache.org">user@hadoop.apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Date: </span>Saturday, February 7, 2015 at=
 8:37 PM<br>
<span style=3D"font-weight:bold">To: </span>&quot;<a href=3D"mailto:user@ha=
doop.apache.org">user@hadoop.apache.org</a>&quot; &lt;<a href=3D"mailto:use=
r@hadoop.apache.org">user@hadoop.apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Subject: </span>Max Connect retries<br>
</div>
<div><br>
</div>
<div>
<div>
<div dir=3D"ltr">Hi, I changed my cluster config so a failed nodemanager ca=
n be detected in about 30 seconds. When I'm running a wordcount the reduce =
gets stuck in 25% for a quite while and logs show nodes trying to connect t=
o the failed node:
<div><br>
</div>
<div>
<pre>org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telle=
s-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/<a href=3D"http://10.3.2.99:49911">1=
0.3.2.99:49911</a>. Already tried 28 time(s); maxRetries=3D45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.ha=
doop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attem=
pt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000</pre>
<pre><span style=3D"font-family: 'Helvetica Neue', Helvetica, Arial, sans-s=
erif; font-size: 13.1999998092651px; white-space: normal;">Is this the expe=
cted behaviour? should I change max retries to a lower values? if so, which=
 &nbsp;config is that?</span><br></pre>
<pre><span style=3D"font-family: 'Helvetica Neue', Helvetica, Arial, sans-s=
erif; font-size: 13.1999998092651px; white-space: normal;">Thanks</span></p=
re>
<pre><br></pre>
</div>
</div>
</div>
</div>
</span>
</body>
</html>

--_000_D0FD7A263458Dxgonghortonworkscom_--