Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6DB3474B1 for ; Tue, 9 Aug 2011 22:23:50 +0000 (UTC) Received: (qmail 75307 invoked by uid 500); 9 Aug 2011 22:23:50 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 75219 invoked by uid 500); 9 Aug 2011 22:23:49 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 75211 invoked by uid 99); 9 Aug 2011 22:23:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 22:23:49 +0000 X-ASF-Spam-Status: No, hits=-2000.8 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 22:23:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 89069B4BFA for ; Tue, 9 Aug 2011 22:23:27 +0000 (UTC) Date: Tue, 9 Aug 2011 22:23:27 +0000 (UTC) From: "Kihwal Lee (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <758209478.21871.1312928607557.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1232241588.1162.1311015477383.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-7472) RPC client should deal with the IP address changes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081970#comment-13081970 ] Kihwal Lee commented on HADOOP-7472: ------------------------------------ For Trunk, {{mvn clean install -Ptar -Ptest-patch}} was run. Results : Tests in error: Tests run: 1334, Failures: 0, Errors: 1, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 6:58.706s [INFO] Finished at: Tue Aug 09 17:21:52 CDT 2011 [INFO] Final Memory: 10M/52M [INFO] ------------------------------------------------------------------------ The following is the failed test, which also fails without this patch. Running org.apache.hadoop.fs.TestFilterFileSystem Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.246 sec <<< FAILURE! The justification for missing test was given in previous comments. I see a better chance of having a meaningful test in trunk than in 0.20-security. I will file a separate Jira for potentially introducing new packages that enables such a test. > RPC client should deal with the IP address changes > -------------------------------------------------- > > Key: HADOOP-7472 > URL: https://issues.apache.org/jira/browse/HADOOP-7472 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc > Affects Versions: 0.20.205.0 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Minor > Fix For: 0.20.205.0 > > Attachments: addr_change_dfs-1.patch.txt, addr_change_dfs-2.patch.txt, addr_change_dfs-3.patch.txt, addr_change_dfs.patch.txt, addr_change_dfs_0_20s-1.patch.txt, addr_change_dfs_0_20s-2.patch.txt, addr_change_dfs_0_20s.patch.txt, addr_change_dfs_trunk-1.patch.txt, addr_change_dfs_trunk-2.patch.txt, addr_change_dfs_trunk-3.patch.txt, addr_change_dfs_trunk.patch.txt > > > The current RPC client implementation and the client-side callers assume that the hostname-address mappings of servers never change. The resolved address is stored in an immutable InetSocketAddress object above/outside RPC, and the reconnect logic in the RPC Connection implementation also trusts the resolved address that was passed down. > If the NN suffers a failure that requires migration, it may be started on a different node with a different IP address. In this case, even if the name-address mapping is updated in DNS, the cluster is stuck trying old address until the whole cluster is restarted. > The RPC client-side should detect this situation and exit or try to recover. > Updating ConnectionId within the Client implementation may get the system work for the moment, there always is a risk of the cached address:port become connectable again unintentionally. The real solution will be notifying upper layer of the address change so that they can re-resolve and retry or re-architecture the system as discussed in HDFS-34. > For 0.20 lines, some type of compromise may be acceptable. For example, raise a custom exception for some well-defined high-impact upper layer to do re-resolve/retry, while other will have to restart. For TRUNK, the HA work will most likely determine what needs to be done. So this Jira won't cover the solutions for TRUNK. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira