Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
MIME-Version: 1.0
Date: Tue, 18 Aug 2015 11:40:50 +0800
Message-ID: 
 <CAADy7x5owRJZyTewvzw1ww3TV6uJHs3kAYmXjGyNyrejotukwg@mail.gmail.com>
Subject: Confusing Yarn RPC Configuration
From: Jeff Zhang <zjffdu@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a113366be37dba9051d8dafb9

--001a113366be37dba9051d8dafb9
Content-Type: text/plain; charset=UTF-8

I use yarn.resourcemanager.connect.max-wait.ms to control how much time to
wait for setting up RM connection. But the weird thing I found that this
configuration is not the real max wait time. Actually Yarn will convert it
to retry count with configuration
yarn.resourcemanager.connect.retry-interval.ms.
Let's say yarn.resourcemanager.connect.max-wait.ms=10000 and
yarn.resourcemanager.connect.retry-interval.ms=2000, then yarn will create
RetryUpToMaximumCountWithFixedSleep with max count = 5 (10000/2000)
Because for each RM connection, there's retry policy inside of hadoop RPC.
Let's say ipc.client.connect.retry.interval=1000
and ipc.client.connect.max.retries=10, so for each RM connection it will
try 10 times and totally cost 10 seconds (1000*10).  So overall for the RM
connection it would cost 50 seconds (10 * 5), and this number is not
consistent with yarn.resourcemanager.connect.max-wait.ms which confuse
users. I am not sure the purpose of 2 rounds of retry policy (Yarn side and
RPC internal side), should it be only 1 round of retry policy and yarn
related configuration is just for override the RPC configuration ?

BTW, I believe it is the same issue for node manage connection.

-- 
Best Regards

Jeff Zhang

--001a113366be37dba9051d8dafb9
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br clear=3D"all"><div>I use=C2=A0<a href=3D"http://yarn.r=
esourcemanager.connect.max-wait.ms">yarn.resourcemanager.connect.max-wait.m=
s</a> to control how much time to wait for setting up RM connection. But th=
e weird thing I found that this configuration is not the real max wait time=
. Actually Yarn will convert it to retry count with configuration <a href=
=3D"http://yarn.resourcemanager.connect.retry-interval.ms">yarn.resourceman=
ager.connect.retry-interval.ms</a>.</div><div>Let&#39;s say <a href=3D"http=
://yarn.resourcemanager.connect.max-wait.ms">yarn.resourcemanager.connect.m=
ax-wait.ms</a>=3D10000 and =C2=A0<a href=3D"http://yarn.resourcemanager.con=
nect.retry-interval.ms">yarn.resourcemanager.connect.retry-interval.ms</a>=
=3D2000, then yarn will create RetryUpToMaximumCountWithFixedSleep with max=
 count =3D 5 (10000/2000)</div>


<div>Because for each RM connection, there&#39;s retry policy inside of had=
oop RPC. Let&#39;s say ipc.client.connect.retry.interval=3D1000 and=C2=A0ip=
c.client.connect.max.retries=3D10, so for each RM connection it will try 10=
 times and totally cost 10 seconds (1000*10).=C2=A0 So overall for the RM c=
onnection it would cost 50 seconds (10 * 5), and this number is not consist=
ent with <a href=3D"http://yarn.resourcemanager.connect.max-wait.ms">yarn.r=
esourcemanager.connect.max-wait.ms</a> which confuse users. I am not sure t=
he purpose of 2 rounds of retry policy (Yarn side and RPC internal side), s=
hould it be only 1 round of retry policy and yarn related configuration is =
just for override the RPC configuration ?</div><div><br></div><div>BTW, I b=
elieve it is the same issue for node manage connection.=C2=A0</div><div><br=
></div>-- <br><div class=3D"gmail_signature">Best Regards<br><br>Jeff Zhang=
</div>
</div>

--001a113366be37dba9051d8dafb9--