hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3564) Make the replication policy pluggable to allow custom replication policies
Date Thu, 28 Jun 2012 20:19:44 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403453#comment-13403453

Sanjay Radia commented on HDFS-3564:

HDFS has assumed that the hierarchical network shows both distance and fault-domains. The
rack represented a group of machines that are close to each other and also within a fault
domain.  For VMs we simply needed to generalize the network topology to more levels to address
both distance and faults with vm-hosts.

This jira suggests that the notion of fault domains can be orthogonal to the topology. Do
we need to change some internal abstractions or is it sufficient to make the placement policy
pluggable? Not sure.

> Make the replication policy pluggable to allow custom replication policies
> --------------------------------------------------------------------------
>                 Key: HDFS-3564
>                 URL: https://issues.apache.org/jira/browse/HDFS-3564
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Sumadhur Reddy Bolli
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> ReplicationTargetChooser currently determines the placement of replicas in hadoop. Making
the replication policy pluggable would help in having custom replication policies that suit
the environment. 
> Eg1: Enabling placing replicas across different datacenters(not just racks)
> Eg2: Enabling placing replicas across multiple(more than 2) racks
> Eg3: Cloud environments like azure have logical concepts like fault and upgrade domains.
Each fault domain spans multiple upgrade domains and each upgrade domain spans multiple fault
domains. Machines are spread typically evenly across both fault and upgrade domains. Fault
domain failures are typically catastrophic/unplanned failures and data loss possibility is
high. An upgrade domain can be taken down by azure for maintenance periodically. Each time
an upgrade domain is taken down a small percentage of machines in the upgrade domain(typically
1-2%) are replaced due to disk failures, thus losing data. Assuming the default replication
factor 3, any 3 data nodes going down at the same time would mean potential data loss. So,
it is important to have a policy that spreads replicas across both fault and upgrade domains
to ensure practically no data loss. The problem here is two dimensional and the default policy
in hadoop is one-dimensional. Custom policies to address issues like these can be written
if we make the policy pluggable.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message