Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3421E18A92 for ; Tue, 18 Aug 2015 03:41:09 +0000 (UTC) Received: (qmail 16696 invoked by uid 500); 18 Aug 2015 03:41:04 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 16544 invoked by uid 500); 18 Aug 2015 03:41:03 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 16531 invoked by uid 99); 18 Aug 2015 03:41:03 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2015 03:41:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3A9F4C1347 for ; Tue, 18 Aug 2015 03:41:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.525 X-Spam-Level: **** X-Spam-Status: No, score=4.525 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HK_RANDOM_ENVFROM=0.626, HK_RANDOM_FROM=0.999, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id f1lC6eLE7Bkm for ; Tue, 18 Aug 2015 03:40:51 +0000 (UTC) Received: from mail-la0-f45.google.com (mail-la0-f45.google.com [209.85.215.45]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 4A20C207D6 for ; Tue, 18 Aug 2015 03:40:51 +0000 (UTC) Received: by lalv9 with SMTP id v9so91162841lal.0 for ; Mon, 17 Aug 2015 20:40:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=jKTIAbj0XyHJzOOeUNMk/dkXXi+uOXOUhrlK7HsShgo=; b=Y8IpKK9e16ApvVQFcTBkfxWzt5+vFEolK1cLJ/FAHYMgBM2Du6W9dEdyub991ffa6G A48+ExUW6yJWnmZdpwbnWybxwUNDaue2BLpvLGlhL5hwNCbB30LROGhnlNczv88CQyBI CH3UMdMz73TLa5ZEsEdN8ZqjvWyJERIP7IYmSSm4Pxi7SOrPAwFqAfXTqAITvYhkfyjL JlwsLulUEpVEX+2rO0DnGejswVNnlSb4bsLRHB98lalJbMUniywOnOmgH4XRWzfNMMif /m9DPcgvx/psG9VU4ZOrcascHF2VdBEN7fawnyxy/kH7bTjFR6upHl0qeHHz5fbboMNt W12A== MIME-Version: 1.0 X-Received: by 10.112.92.101 with SMTP id cl5mr3664036lbb.67.1439869250819; Mon, 17 Aug 2015 20:40:50 -0700 (PDT) Received: by 10.25.62.19 with HTTP; Mon, 17 Aug 2015 20:40:50 -0700 (PDT) Date: Tue, 18 Aug 2015 11:40:50 +0800 Message-ID: Subject: Confusing Yarn RPC Configuration From: Jeff Zhang To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113366be37dba9051d8dafb9 --001a113366be37dba9051d8dafb9 Content-Type: text/plain; charset=UTF-8 I use yarn.resourcemanager.connect.max-wait.ms to control how much time to wait for setting up RM connection. But the weird thing I found that this configuration is not the real max wait time. Actually Yarn will convert it to retry count with configuration yarn.resourcemanager.connect.retry-interval.ms. Let's say yarn.resourcemanager.connect.max-wait.ms=10000 and yarn.resourcemanager.connect.retry-interval.ms=2000, then yarn will create RetryUpToMaximumCountWithFixedSleep with max count = 5 (10000/2000) Because for each RM connection, there's retry policy inside of hadoop RPC. Let's say ipc.client.connect.retry.interval=1000 and ipc.client.connect.max.retries=10, so for each RM connection it will try 10 times and totally cost 10 seconds (1000*10). So overall for the RM connection it would cost 50 seconds (10 * 5), and this number is not consistent with yarn.resourcemanager.connect.max-wait.ms which confuse users. I am not sure the purpose of 2 rounds of retry policy (Yarn side and RPC internal side), should it be only 1 round of retry policy and yarn related configuration is just for override the RPC configuration ? BTW, I believe it is the same issue for node manage connection. -- Best Regards Jeff Zhang --001a113366be37dba9051d8dafb9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I use=C2=A0yarn.resourcemanager.connect.max-wait.m= s to control how much time to wait for setting up RM connection. But th= e weird thing I found that this configuration is not the real max wait time= . Actually Yarn will convert it to retry count with configuration yarn.resourceman= ager.connect.retry-interval.ms.
Let's say yarn.resourcemanager.connect.m= ax-wait.ms=3D10000 and =C2=A0yarn.resourcemanager.connect.retry-interval.ms= =3D2000, then yarn will create RetryUpToMaximumCountWithFixedSleep with max= count =3D 5 (10000/2000)
Because for each RM connection, there's retry policy inside of had= oop RPC. Let's say ipc.client.connect.retry.interval=3D1000 and=C2=A0ip= c.client.connect.max.retries=3D10, so for each RM connection it will try 10= times and totally cost 10 seconds (1000*10).=C2=A0 So overall for the RM c= onnection it would cost 50 seconds (10 * 5), and this number is not consist= ent with yarn.r= esourcemanager.connect.max-wait.ms which confuse users. I am not sure t= he purpose of 2 rounds of retry policy (Yarn side and RPC internal side), s= hould it be only 1 round of retry policy and yarn related configuration is = just for override the RPC configuration ?

BTW, I b= elieve it is the same issue for node manage connection.=C2=A0
--
Best Regards

Jeff Zhang=
--001a113366be37dba9051d8dafb9--