hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2559) DFS should place one replica per rack
Date Fri, 08 Feb 2008 17:55:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567129#action_12567129
] 

Runping Qi commented on HADOOP-2559:
------------------------------------


A few folks are concerned about the write performance if three replicas are placed on three
racks.
Arun and Owen proposed a comprise as follow:

DFS client should place one replica on a random node of its local rack, if that is possible,
and then choose a remote rack randomly and
place the other two replicas on two random nodes from that remote rack. This gives us pretty
good distribution and needs 
only one remote rack write.

Arun amd Owen, please correct if the above quote is inaccurate.

I like this proposal.




> DFS should place one replica per rack
> -------------------------------------
>
>                 Key: HADOOP-2559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2559
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Runping Qi
>            Assignee: lohit vijayarenu
>
> Currently, when writing out a block, dfs will place one copy to a local data node, one
copy to a rack local node
> and another one to a remote node. This leads to a number of undesired properties:
> 1. The block will be rack-local to two tacks instead of three, reducing the advantage
of rack locality based scheduling by 1/3.
> 2. The Blocks of a file (especiallya  large file) are unevenly distributed over the nodes:
One third will be on the local node, and two thirds on the nodes on the same rack. This may
make some nodes full much faster than others, 
> increasing the need of rebalancing. Furthermore, this also make some nodes become "hot
spots" if those big 
> files are popular and accessed by many applications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message