hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7472) RPC client should deal with the IP address changes
Date Tue, 02 Aug 2011 06:50:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076071#comment-13076071
] 

Suresh Srinivas commented on HADOOP-7472:
-----------------------------------------

Client#connections is a map of ConnectionId to Connection. ConnectionId hash code uses the
address corresponding to InetSocketAddres, along with other keys (see ConnectionId#hashCode().

Before failure, map has ConnectionId (address x) mapping to Connection(address x)
On failure recovery, map has ConnectionId (address x) mapping to Connection(address y).

At this point in time:
# Connection close actually correctly removes the map entry using ConnectionId.
# A new ConnectionId(address x) still finds the cached Connection(address y)
# A new ConnectionId(address y) *will not find* cached Connection(address y).

On encountering 3) for the first time, a new ConnectionId(address y) to new Connection(address
y) will be added to the map. This is not a problem. Just wanted to see if you think of any
issues I might be missing, given my description.

Minor comment on the patch. isAddressChanged(boolean update), is always called with update
set to true. Do you think it would be better to change this method name to updateAddress()
with no args? I also feel instead of {{Check whether the hostname:address mapping is still
valid.}}, we could add some thing like {{Update the address corresponding to server if the
address corresponding to the host name has changed.}}


> RPC client should deal with the IP address changes
> --------------------------------------------------
>
>                 Key: HADOOP-7472
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7472
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.20.205.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Minor
>             Fix For: 0.20.205.0
>
>         Attachments: addr_change_dfs-1.patch.txt, addr_change_dfs-2.patch.txt, addr_change_dfs.patch.txt
>
>
> The current RPC client implementation and the client-side callers assume that the hostname-address
mappings of servers never change. The resolved address is stored in an immutable InetSocketAddress
object above/outside RPC, and the reconnect logic in the RPC Connection implementation also
trusts the resolved address that was passed down.
> If the NN suffers a failure that requires migration, it may be started on a different
node with a different IP address. In this case, even if the name-address mapping is updated
in DNS, the cluster is stuck trying old address until the whole cluster is restarted.
> The RPC client-side should detect this situation and exit or try to recover.
> Updating ConnectionId within the Client implementation may get the system work for the
moment, there always is a risk of the cached address:port become connectable again unintentionally.
The real solution will be notifying upper layer of the address change so that they can re-resolve
and retry or re-architecture the system as discussed in HDFS-34. 
> For 0.20 lines, some type of compromise may be acceptable. For example, raise a custom
exception for some well-defined high-impact upper layer to do re-resolve/retry, while other
will have to restart.  For TRUNK, the HA work will most likely determine what needs to be
done.  So this Jira won't cover the solutions for TRUNK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message