hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Milind.Bhandar...@emc.com>
Subject Re: Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.
Date Mon, 04 Jun 2012 19:00:45 GMT
That's great Junping.

Hoping to see this in trunk / hadoop 2.0 and hadoop 1.1 soon.

- milind

On Jun 4, 2012, at 8:48 AM, Jun Ping Du wrote:

> Hello Folks,
>      I just filed a Umbrella jira today to address current NetworkTopology issue that
binding strictly to three tier network. The motivation here is to make hadoop more flexible
for deploying topology (especially for cloud/virtualization case) and more configurable in
data locality related policies like: replica placement, task scheduling, choosing block for
DFSClient reading, balancing. 
>      We submit a draft proposal in this Umbrella as well as the implementation code.
As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to
be more convenient for reviewing. However, we split the code based on functionality which
cause some dependencies between patches which way we are not sure the best. Welcome to provide
comments and suggestions on doc and code, and look forward to work with all of you to enhance
hadoop in some new situations towards perfect.
>      Hope this is a good start.    
> Cheers,
> Junping
> ----- Original Message -----
> From: "Junping Du (JIRA)" <jira@apache.org>
> To: common-issues@hadoop.apache.org
> Sent: Monday, June 4, 2012 12:09:22 PM
> Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different
failure and locality topologies
> Junping Du created HADOOP-8468:
> ----------------------------------
>             Summary: Umbrella of enhancements to support different failure and locality
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 2.0.0-alpha, 1.0.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
> The current hadoop network topology (described in some previous issues like: Hadoop-692)
works well in classic three-tiers network when it comes out. However, it does not take into
account other failure models or changes in the infrastructure that can affect network bandwidth
efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology
in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order
to match the reliability of a physical deployment, replication of data across two virtual
machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower
latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level
in the hierarchical topology, a node group level, which maps well onto an infrastructure that
is based on a virtualized environment.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message