hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3799) Design a pluggable interface to place replicas of blocks in HDFS
Date Wed, 13 May 2009 05:46:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708766#action_12708766

dhruba borthakur commented on HADOOP-3799:

> Especially since in order to be stable under the rebalancer

Oh guys, you are going too far! I am talking of faster cycle of innovation and iteration.
A pluggable interface allows the hadoop community to try experiments with newer methods of
block placement. Once such a placement algorithm proves beneficial and helpful, does the related
questions of "how to make the balancer work with the new placement policy" come into my mind.
If  experiments prove that there isn't any viable alternative pluggable policy, then the question
of "does the balancer work with a pluggable policy" is moot.  

> hdfs probably needs to store metadata with the files or blocks

I do not like this approach. It makes hdfs heavy, clunky and difficult to maintain. Have you
seen what happened to other file system that tried to do everything inside it, e.g. DCE-DFS?
It is possible that HDFS might allow generic blobs to be stored stored with files (aka extended
file attributes) where application specific data can be stored. But it should be disassociated
from a "requirement" that archival-policy has to be stored with file meta-data.

Again folks, I agree completely with you that a "finished product" needs to encompass the
"balancer". But to start experimenting to figure out whether a different placement policy
is beneificial at all, I need the pluggability feature, otherwise I have to keep changing
my hadoop source code every time I want to experiment. My experiments will probably take 3
to six months, especially because I want to benchmark results at large scale.

For installations that go with the default policy, there is no impact at all.

> Design a pluggable interface to place replicas of blocks in HDFS
> ----------------------------------------------------------------
>                 Key: HADOOP-3799
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3799
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: BlockPlacementPluggable.txt
> The current HDFS code typically places one replica on local rack, the second replica
on remote random rack and the third replica on a random node of that remote rack. This algorithm
is baked in the NameNode's code. It would be nice to make the block placement algorithm a
pluggable interface. This will allow experimentation of different placement algorithms based
on workloads, availability guarantees and failure models.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message