Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of tellesnobrega@gmail.com
 designates 209.85.214.176 as permitted sender)
MIME-Version: 1.0
References: 
 <CADbqdAx1hacJWvDan-tJkAL=YvqDqdyjdC1Duj7SMUGBriuP1A@mail.gmail.com>
 <D0FD7A26.3458D%xgong@hortonworks.com>
From: Telles Nobrega <tellesnobrega@gmail.com>
Date: Mon, 09 Feb 2015 10:49:28 +0000
Message-ID: 
 <CADbqdAx9Gh0A=sAcEV8yO5JtO9vjaxfi_ysuMTaKwLr=ghEHwA@mail.gmail.com>
Subject: Re: Max Connect retries
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a113d2e2c4cbc1f050ea58609

--001a113d2e2c4cbc1f050ea58609
Content-Type: text/plain; charset=UTF-8

Thanks

On Mon Feb 09 2015 at 01:43:24 Xuan Gong <xgong@hortonworks.com> wrote:

>  That is for client connect retry in ipc level.
>
> You can decrease the max.retries by configuring
>
> ipc.client.connect.max.retries.on.timeouts
>
> in core-site.xml
>
>
>  Thanks
>
>  Xuan Gong
>
>   From: Telles Nobrega <tellesnobrega@gmail.com>
> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Date: Saturday, February 7, 2015 at 8:37 PM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: Max Connect retries
>
>   Hi, I changed my cluster config so a failed nodemanager can be detected
> in about 30 seconds. When I'm running a wordcount the reduce gets stuck in
> 25% for a quite while and logs show nodes trying to connect to the failed
> node:
>
>  org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already tried 28 time(s); maxRetries=45
> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000
>
> Is this the expected behaviour? should I change max retries to a lower values? if so, which  config is that?
>
> Thanks
>
>
>

--001a113d2e2c4cbc1f050ea58609
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks<br></div><br><div class=3D"gmail_quote">On Mon Feb =
09 2015 at 01:43:24 Xuan Gong &lt;<a href=3D"mailto:xgong@hortonworks.com">=
xgong@hortonworks.com</a>&gt; wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-family:Calibri,san=
s-serif">
<div>
<p style=3D"margin:0px;font-family:Times">That is for client connect retry =
in ipc level.=C2=A0</p>
<p style=3D"margin:0px;font-family:Times">You can decrease the max.retries =
by configuring=C2=A0</p>
<p style=3D"margin:0px;font-family:Times">ipc.client.connect.max.retries.on=
.timeouts</p>
<p style=3D"margin:0px;font-family:Calibri"></p>
<p style=3D"margin:0px;font-family:Times">in core-site.xml</p>
</div>
<div style=3D"font-size:14px"><br>
</div>
<div style=3D"font-size:14px"><br>
</div>
<div style=3D"font-size:14px">Thanks</div>
<div style=3D"font-size:14px"><br>
</div>
<div style=3D"font-size:14px">Xuan Gong</div>
<div style=3D"font-size:14px"><br>
</div>
<span style=3D"font-size:14px">
<div style=3D"font-family:Calibri;font-size:11pt;text-align:left;color:blac=
k;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADD=
ING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:me=
dium none;PADDING-TOP:3pt">
<span style=3D"font-weight:bold">From: </span>Telles Nobrega &lt;<a href=3D=
"mailto:tellesnobrega@gmail.com" target=3D"_blank">tellesnobrega@gmail.com<=
/a>&gt;<br>
<span style=3D"font-weight:bold">Reply-To: </span>&quot;<a href=3D"mailto:u=
ser@hadoop.apache.org" target=3D"_blank">user@hadoop.apache.org</a>&quot; &=
lt;<a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@hadoop.=
apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Date: </span>Saturday, February 7, 2015 at=
 8:37 PM<br>
<span style=3D"font-weight:bold">To: </span>&quot;<a href=3D"mailto:user@ha=
doop.apache.org" target=3D"_blank">user@hadoop.apache.org</a>&quot; &lt;<a =
href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@hadoop.apache=
.org</a>&gt;<br>
<span style=3D"font-weight:bold">Subject: </span>Max Connect retries<br>
</div></span></div><div style=3D"word-wrap:break-word;color:rgb(0,0,0);font=
-family:Calibri,sans-serif"><span style=3D"font-size:14px">
<div><br>
</div>
<div>
<div>
<div dir=3D"ltr">Hi, I changed my cluster config so a failed nodemanager ca=
n be detected in about 30 seconds. When I&#39;m running a wordcount the red=
uce gets stuck in 25% for a quite while and logs show nodes trying to conne=
ct to the failed node:
<div><br>
</div>
<div>
<pre>org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-telle=
s-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/<a href=3D"http://10.3.2.99:49911" t=
arget=3D"_blank">10.3.2.99:49911</a>. Already tried 28 time(s); maxRetries=
=3D45
2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] org.apache.ha=
doop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attem=
pt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000</pre>
<pre><span style=3D"font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sa=
ns-serif;font-size:13.1999998092651px;white-space:normal">Is this the expec=
ted behaviour? should I change max retries to a lower values? if so, which =
=C2=A0config is that?</span><br></pre>
<pre><span style=3D"font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sa=
ns-serif;font-size:13.1999998092651px;white-space:normal">Thanks</span></pr=
e>
<pre><br></pre>
</div>
</div>
</div>
</div>
</span></div></blockquote></div>

--001a113d2e2c4cbc1f050ea58609--