Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4896182F3 for ; Mon, 6 Jul 2015 23:58:48 +0000 (UTC) Received: (qmail 79611 invoked by uid 500); 6 Jul 2015 23:58:48 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 79576 invoked by uid 500); 6 Jul 2015 23:58:48 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 79562 invoked by uid 99); 6 Jul 2015 23:58:48 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2015 23:58:48 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id D84D7B341B; Mon, 6 Jul 2015 23:58:46 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5212987736646817197==" MIME-Version: 1.0 Subject: Review Request 36231: Revert, The Default hdfs-site.xml Should Have Client Retry Logic Enabled For Rolling Upgrade From: "Alejandro Fernandez" To: "Nate Cole" , "Jonathan Hurley" Cc: "Alejandro Fernandez" , "Ambari" Date: Mon, 06 Jul 2015 23:58:46 -0000 Message-ID: <20150706235846.8752.11257@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Alejandro Fernandez" X-ReviewGroup: Ambari X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/36231/ X-Sender: "Alejandro Fernandez" Reply-To: "Alejandro Fernandez" X-ReviewRequest-Repository: ambari --===============5212987736646817197== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36231/ ----------------------------------------------------------- Review request for Ambari, Jonathan Hurley and Nate Cole. Repository: ambari Description ------- In the case of an HA cluster where the former primary NN was killed "dirty", by catastrophic power-down or equivalent, and the cluster has successfully failed over to the other NN, a client that first attempts to contact the dead NN takes 10 minutes to switch to the other NN. In Ambari 2.0 and HDP 2.2, dfs.client.retry.policy.enabled was not set at all. Recently, in Ambari 2.1 for HDP 2.3, it was defaulted to true as part of AMBARI-11192. However, this causes problems during RU In an HA setup, our retry actually should be handled by RetryInvocationHandler using retry policy FailoverOnNetworkExceptionRetry. The client first translates the nameservice ID into two host names, and creates an individual RPC proxy for each NameNode accordingly. Each individual NameNode proxy still uses MultipleLinearRandomRetry as its local retry policy, but because we usually set dfs.client.retry.policy.enabled to false, thus this internal retry is actually disabled. Then in case we hit any connection issue or remote exception (including StandbyException), the exception is caught by RetryInvocationHandler and handled according to FailoverOnNetworkExceptionRetry. In this way the client can failover to the other namenode immediately instead of keeping retrying the same NameNode. However, here because we set dfs.client.retry.policy.enabled to true, the MultipleLinearRandomRetry is triggered inside of the internal NameNode proxy thus we have to wait 10+ min. The exception is finally thrown to RetryInvocationHandler until all the retries of MultipleLinearRandomRetry fail. Diffs ----- ambari-server/src/main/java/org/apache/ambari/server/checks/CheckDescription.java 5e029f4 ambari-server/src/main/java/org/apache/ambari/server/checks/ClientRetryPropertyCheck.java 4beba33 ambari-server/src/main/resources/stacks/HDP/2.2/services/HDFS/configuration/hdfs-site.xml e42b3f8 ambari-server/src/test/java/org/apache/ambari/server/checks/ClientRetryPropertyCheckTest.java d3fd187 Diff: https://reviews.apache.org/r/36231/diff/ Testing ------- Unit tests passed, ---------------------------------------------------------------------- Total run:761 Total errors:0 Total failures:0 OK I deployed my changes to a brand new cluster and it correctly set the hdfs-site property dfs.client.retry.policy.enabled to false. Thanks, Alejandro Fernandez --===============5212987736646817197==--