hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-14786) A new block placement policy tolerating availability zone failure
Date Wed, 28 Aug 2019 00:58:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mingliang Liu updated HDFS-14786:
---------------------------------
    Description: 
{{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default block placement
policies are rack awareness for better fault tolerance. Newer block placement policy like
{{BlockPlacementPolicyRackFaultTolerant}} tries its best to place the replicas to most racks,
which further tolerates more racks failing. HADOOP-8470 brought {{NetworkTopologyWithNodeGroup}}
to add another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer topology.
With that, replicas within a rack can be placed in different node groups for better isolation.

Existing block placement policies tolerate one rack failure since at least two racks are chosen
in those cases. Chances are all replicas could be placed in the same datacenter, though there
are multiple data centers in the same cluster topology. In other words, fault of higher layers
beyond rack is not well tolerated.

However, more deployments in public cloud are leveraging multiple available zones (AZ) for
high-availability since the inter-AZ latency seems affordable in many cases. In a single AZ,
some cloud providers like AWS support [partitioned placement groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
which basically are different racks. A simple network topology mapped to HDFS is "/availabilityzone/rack/host"
3 layers.

To achieve high availability tolerating zone failure, this JIRA proposes a new data placement
policy which tries its best to place replicas in most AZs, most racks, and most evenly distributed.

Examples with 3 replicas, we choose racks as following:
 # 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to most racks
 # 2AZ: randomly choose one rack in 1st AZ and randomly choose two racks in the other AZ
 # 3AZ: randomly choose one rack in one AZ
 # 4AZ: randomly choose three AZ and one rack in each AZ
 After racks are chosen, hosts are chosen randomly honoring local storage, favorite nodes,
excluded nodes, storage types etc.

Data may become imbalance if topology is very uneven in AZs. This seems not a problem as in
public cloud, infrastructure provisioning is more flexible than 1P.

  was:
{{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default block placement
policies are rack awareness for better fault tolerance Newer block placement policy like {{BlockPlacementPolicyRackFaultTolerant}}
tries its best to place the replicas to most racks, which further tolerates more racks failing.
[HADOOP-8470] brought {{NetworkTopologyWithNodeGroup}} to add another layer under rack, i.e.
"/datacenter/rack/host/nodegroup" 4 layer topology. With that, replicas within a rack can
be placed in different node groups for better isolation.

Existing block placement policies tolerate rack failure since at least two racks are chosen
in those cases. Chances are all replicas could be placed in the same datacenter, though there
are multiple data centers in the same cluster topology. In other words, fault of higher layers
beyond rack is not well tolerated.

However, more deployments in public cloud are leveraging multiple available zones (AZ) for
high-availability since the inter-AZ latency seems affordable in many cases. In a single AZ,
some cloud providers like AWS support [partitioned placement groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
which basically are different racks. A simple network topology mapped to HDFS is "/availabilityzone/rack/host"
3 layers.

To achieve high availability tolerating zone failure, this JIRA proposes a new data placement
policy which tries its best to place replicas in most AZs, most racks, and most evenly distributed.

Examples with 3 replicas, we choose racks as following:
# 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to most racks
# 2AZ: randomly choose one rack in 1st AZ and randomly choose two racks in the other AZ
# 3AZ: randomly choose one rack in one AZ
# 4AZ: randomly choose three AZ and one rack in each AZ
After racks are chosen, hosts are chosen randomly honoring local storage, favorite nodes,
excluded nodes, storage types etc.

Data may become imbalance if topology is very uneven in AZs. This seems not a problem as in
public cloud, infrastructure provisioning is more flexible than 1P.


> A new block placement policy tolerating availability zone failure
> -----------------------------------------------------------------
>
>                 Key: HDFS-14786
>                 URL: https://issues.apache.org/jira/browse/HDFS-14786
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: block placement
>            Reporter: Mingliang Liu
>            Priority: Major
>
> {{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default block placement
policies are rack awareness for better fault tolerance. Newer block placement policy like
{{BlockPlacementPolicyRackFaultTolerant}} tries its best to place the replicas to most racks,
which further tolerates more racks failing. HADOOP-8470 brought {{NetworkTopologyWithNodeGroup}}
to add another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer topology.
With that, replicas within a rack can be placed in different node groups for better isolation.
> Existing block placement policies tolerate one rack failure since at least two racks
are chosen in those cases. Chances are all replicas could be placed in the same datacenter,
though there are multiple data centers in the same cluster topology. In other words, fault
of higher layers beyond rack is not well tolerated.
> However, more deployments in public cloud are leveraging multiple available zones (AZ)
for high-availability since the inter-AZ latency seems affordable in many cases. In a single
AZ, some cloud providers like AWS support [partitioned placement groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
which basically are different racks. A simple network topology mapped to HDFS is "/availabilityzone/rack/host"
3 layers.
> To achieve high availability tolerating zone failure, this JIRA proposes a new data placement
policy which tries its best to place replicas in most AZs, most racks, and most evenly distributed.
> Examples with 3 replicas, we choose racks as following:
>  # 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to most racks
>  # 2AZ: randomly choose one rack in 1st AZ and randomly choose two racks in the other
AZ
>  # 3AZ: randomly choose one rack in one AZ
>  # 4AZ: randomly choose three AZ and one rack in each AZ
>  After racks are chosen, hosts are chosen randomly honoring local storage, favorite nodes,
excluded nodes, storage types etc.
> Data may become imbalance if topology is very uneven in AZs. This seems not a problem
as in public cloud, infrastructure provisioning is more flexible than 1P.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message