hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies
Date Fri, 08 Jun 2012 07:05:23 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291584#comment-13291584
] 

Konstantin Shvachko commented on HADOOP-8468:
---------------------------------------------

Junping, I went over the design document. It is pretty comprehensive. A few comments on the
design.

# Conceptually you are extending current Network Topology by introducing a new layer of leaf
nodes. Current topology assumes that physical nodes are the leaves of the hierarchy and you
add virtual nodes that can reside on physical nodes. I think this is a more logical way to
look at the new topology, rather than saying that you introduce the second layer (node groups)
over the nodes, as document does.
# The document should clarify how local storage is used by VMs on a physical box. I think
the assumption is that VMs never share storage resources. Otherwise there could be a reporting
problem. That is, if two VMs share a drive and send two DF reports to the NameNode, then the
drive will be counted twice, which can cause problems. I'd recommend to update the pictures
and add a section talking about reporting of DNs' resources to NN to make this issue explicitly
covered in the design.
# For block replication there are 3 policies to consider:
#* block placement policy, when a new block is created
#* block replication policy, when under-replicated blocks are recovered
#* replica removal policy, when replicas are removed for over-replicated blocks
You covered the first two, and probably need to look into the third as well.
For the first two I'd be good to write down the entire modified policy rather than just listing
the differences. 
_And make sure they converge to existing policies if virtual node layer is not defined._
# For YARN I am not convinced you will need to run multiple VMs per node, if not for the sake
of generosity. It seems YARN should rely on NodeManager to report resources and manage Containers
of a node as a whole. Not sure how multiple VMs on a node can help here. 
For MRv1 on the contrary running multiple VMs per node can be useful for modeling variable
slots. In this case again the VMs should not share memory otherwise repoting will go wrong.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for
enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692)
works well in classic three-tiers network when it comes out. However, it does not take into
account other failure models or changes in the infrastructure that can affect network bandwidth
efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology
in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order
to match the reliability of a physical deployment, replication of data across two virtual
machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower
latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level
in the hierarchical topology, a node group level, which maps well onto an infrastructure that
is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message