hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3799) Design a pluggable interface to place replicas of blocks in HDFS
Date Fri, 08 May 2009 13:25:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707337#action_12707337

Tom White commented on HADOOP-3799:

Dhruba, These look like good changes - glad to see this moving forward. More comments below:

* Can BlockPlacementInterface be an abstract class? I would also change its name to not have
the "Interface" suffix, something like ReplicationPolicy, or BlockPlacementPolicy. ReplicationTargetChooser
could be renamed something like DoubleRackReplicationPolicy or DoubleRackBlockPlacementPolicy
or similar, to better describe its role.
* Why doesn't ReplicationPolicy simply pass through verifyBlockPlacement()? It seems odd that
it's doing extra work here.
* BlockPlacementInterface#chooseTarget(). Make excludedNodes a List<DatanodeDescriptor>.
Implementations may choose to turn it into a map if they need to, but for the interface, it
should just be a list, shouldn't it?
* For future evolution, can we pass a Configuration to the initialize() method, rather than
the considerLoad boolean?
* Rather than passing the full FSNamesystem to the initialize method, it would be preferable
to create an interface for the part that the block placement strategy needs. Something like
FSNamespaceStats, which only needs getTotalLoad() for the moment. I think this is an acceptable
use of an interface, since it only used by developers writing a new block placement strategy.
There's a similar situtation for job scheduling in MapReduce: JobTracker implements the package-private
TaskTrackerManager interface so that TaskScheduler doesn't have to pull in the whole JobTracker.
This helps a lot with testing.
* These changes should make it possible to unit test ReplicationTargetChooser directly. This
could be another Jira.

> Design a pluggable interface to place replicas of blocks in HDFS
> ----------------------------------------------------------------
>                 Key: HADOOP-3799
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3799
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: BlockPlacementPluggable.txt
> The current HDFS code typically places one replica on local rack, the second replica
on remote random rack and the third replica on a random node of that remote rack. This algorithm
is baked in the NameNode's code. It would be nice to make the block placement algorithm a
pluggable interface. This will allow experimentation of different placement algorithms based
on workloads, availability guarantees and failure models.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message