hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-692) Rack-aware Replica Placement
Date Tue, 14 Nov 2006 20:04:41 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-692?page=comments#action_12449759 ] 
Hairong Kuang commented on HADOOP-692:

1. Network topology construction
    Sould we consider a mobile network with laptops running datanode moving around? Otherwise,
once a datanode gets started, the possiblity of the node moves to a different location is
slim. I would simply update network toplogy when a datanode registers or exits.

The # of hops between hubs can be specified in a configuration file which read by the namenode
at startup time.

2. Network toplogy interface
   Doug, I like the interface that you described. But it looks like it can not express the
case when nodes are connected by a switch where the distance between two nodes is 1. It also
need to a method to expose all nodes that belong to a hub.

2. block placement strategy
Allocating all blocks of a file to the same 3 racks limits the aggaregate read bandwith. I
do not see much of its benefit.

I am thinking to allow users to specify replica placement policy at runtime when it sets the
replication factor of a file. It can use any predefined placement policy or set a user-defined
placement policy.

Users may specify its replica placement policy using a declarative language. Something like:

replica 1: same node
replica 2: same rack
replica 3: different rack
others: random

Any comment?

> Rack-aware Replica Placement
> ----------------------------
>                 Key: HADOOP-692
>                 URL: http://issues.apache.org/jira/browse/HADOOP-692
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.9.0
> This issue assumes that HDFS runs on a cluster of computers that spread across many racks.
Communication between two nodes on different racks needs to go through switches. Bandwidth
in/out of a rack may be less than the total bandwidth of machines in the rack. The purpose
of rack-aware replica placement is to improve data reliability, availability, and network
bandwidth utilization. The basic idea is that each data node determines to which rack it belongs
at the startup time and notifies the name node of the rack id upon registration. The name
node maintains a rackid-to-datanode map and tries to place replicas across racks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message