hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7472) RPC client should deal with the IP address changes
Date Tue, 09 Aug 2011 22:23:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081970#comment-13081970
] 

Kihwal Lee commented on HADOOP-7472:
------------------------------------

For Trunk, {{mvn clean install -Ptar -Ptest-patch}} was run.
Results :

Tests in error: 

Tests run: 1334, Failures: 0, Errors: 1, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6:58.706s
[INFO] Finished at: Tue Aug 09 17:21:52 CDT 2011
[INFO] Final Memory: 10M/52M
[INFO] ------------------------------------------------------------------------

The following is the failed test, which also fails without this patch.

Running org.apache.hadoop.fs.TestFilterFileSystem
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.246 sec <<< FAILURE!

The justification for missing test was given in previous comments.  I see a better chance
of having a meaningful test in trunk than in 0.20-security. I will file a separate Jira for
potentially introducing new packages that enables such a test.

> RPC client should deal with the IP address changes
> --------------------------------------------------
>
>                 Key: HADOOP-7472
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7472
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.20.205.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Minor
>             Fix For: 0.20.205.0
>
>         Attachments: addr_change_dfs-1.patch.txt, addr_change_dfs-2.patch.txt, addr_change_dfs-3.patch.txt,
addr_change_dfs.patch.txt, addr_change_dfs_0_20s-1.patch.txt, addr_change_dfs_0_20s-2.patch.txt,
addr_change_dfs_0_20s.patch.txt, addr_change_dfs_trunk-1.patch.txt, addr_change_dfs_trunk-2.patch.txt,
addr_change_dfs_trunk-3.patch.txt, addr_change_dfs_trunk.patch.txt
>
>
> The current RPC client implementation and the client-side callers assume that the hostname-address
mappings of servers never change. The resolved address is stored in an immutable InetSocketAddress
object above/outside RPC, and the reconnect logic in the RPC Connection implementation also
trusts the resolved address that was passed down.
> If the NN suffers a failure that requires migration, it may be started on a different
node with a different IP address. In this case, even if the name-address mapping is updated
in DNS, the cluster is stuck trying old address until the whole cluster is restarted.
> The RPC client-side should detect this situation and exit or try to recover.
> Updating ConnectionId within the Client implementation may get the system work for the
moment, there always is a risk of the cached address:port become connectable again unintentionally.
The real solution will be notifying upper layer of the address change so that they can re-resolve
and retry or re-architecture the system as discussed in HDFS-34. 
> For 0.20 lines, some type of compromise may be acceptable. For example, raise a custom
exception for some well-defined high-impact upper layer to do re-resolve/retry, while other
will have to restart.  For TRUNK, the HA work will most likely determine what needs to be
done.  So this Jira won't cover the solutions for TRUNK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message