hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
Date Thu, 05 Apr 2012 04:55:26 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247020#comment-13247020
] 

Suresh Srinivas commented on HDFS-3150:
---------------------------------------

Given there are some discussions happening around +1s from committer, it is probably a good
idea to wait for +1. Should we also keep release manager posted about this change? I generally
post an email to hdfs/common dev about this kind of changes.
                
> Add option for clients to contact DNs via hostname in branch-1
> --------------------------------------------------------------
>
>                 Key: HDFS-3150
>                 URL: https://issues.apache.org/jira/browse/HDFS-3150
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, hdfs client
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>             Fix For: 1.1.0
>
>         Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt
>
>
> Per the document attached to HADOOP-8198, this is just for branch-1, and unbreaks DN
multihoming. The datanode can be configured to listen on a bond, or all interfaces by specifying
the wildcard in the dfs.datanode.*.address configuration options, however per HADOOP-6867
only the source address of the registration is exposed to clients. HADOOP-985 made clients
access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect
of breaking DN multihoming. In order to fix it let's add back the option for Datanodes to
be accessed by hostname. This can be done by:
> # Modifying the primary field of the Datanode descriptor to be the hostname, or 
> # Modifying Client/Datanode <-> Datanode access use the hostname field instead
of the IP
> I'd like to go with approach #2 as it does not require making an incompatible change
to the client protocol, and is much less invasive. It minimizes the scope of modification
to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers.
> New client and Datanode configuration options are introduced:
> - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should
use the datanode hostname (as clients outside cluster may not be able to route the IP)
> - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames
when connecting to other Datanodes for data transfer
> If the configuration options are not used, there is no change in the current behavior.
> I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the use of
DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based on the context the
ID is being used in, vs always using the IP:xferPort as the Datanode's name, and using the
name everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message