hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tanping Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1875) MiniDFSCluster hard-codes dfs.datanode.address to localhost
Date Mon, 02 May 2011 22:49:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027895#comment-13027895
] 

Tanping Wang commented on HDFS-1875:
------------------------------------

I like this idea.  It would be really useful if we can have multiple simulated data nodes
binded to different hosts and dfs client binded to a particular host.  And futher down the
road, some of the simulated data nodes on different hosts, but the same rack.  We can use
this to test network topology distance related issues.

One of the related problem that I ran into was that the order of data nodes in LocatedBlock
returned by name nodes is sorted by NetworkTopology#pseudoSortByDistance().  In current Mini
dfs cluster, there is no way I can bind the client to a host or bind a simulated data node
to a particular host/rack.  It would be nice if mini dfs cluster can make this possible, so
that the network topology distance of client to each data node is fixed.  Therefore, the order
of data nodes returned within a LocatedBlock on MiniDFS cluster is fixed.  Currently the order
of data nodes in LocatedBlock is randomly sorted which means NetworkTopology understand the
DFSClient and simulated datanodes are not different hosts and different racks. 

Also in currently Mini DFS client provides the opton of -racks when starting data nodes. 
But we can not bind multiple simulated data nodes to one rack... so it is not really that
useful.

> MiniDFSCluster hard-codes dfs.datanode.address to localhost
> -----------------------------------------------------------
>
>                 Key: HDFS-1875
>                 URL: https://issues.apache.org/jira/browse/HDFS-1875
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>             Fix For: 0.23.0
>
>
> When creating RPC addresses that represent the communication sockets for each simulated
DataNode, the MiniDFSCluster class hard-codes the address of the dfs.datanode.address port
to be "127.0.0.1:0"
> The DataNodeCluster test tool uses the MiniDFSCluster class to create a selected number
of simulated datanodes on a single host. In the DataNodeCluster setup, the NameNode is not
simulated but is started as a separate daemon.
> The problem is that if the write requrests into the simulated datanodes are originated
on a host that is not the same host running the simulated datanodes, the connections are refused.
This is because the RPC sockets that are started by MiniDFSCluster are for "localhost" (127.0.0.1)
and are not accessible from outside that same machine.
> It is proposed that the MiniDFSCluster.setupDatanodeAddress() method be overloaded in
order to accommodate an environment where the NameNode is on one host, the client is on another
host, and the simulated DataNodes are on yet another host (or even multiple hosts simulating
multiple DataNodes each).
> The overloaded API would add a parameter that would be used as the basis for creating
the RPS sockets. By default, it would remain 127.0.0.1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message