hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Buell <jbu...@vmware.com>
Subject RE: Moving disks from one datanode to another
Date Wed, 07 Dec 2011 20:21:10 GMT
Each physical host has 12 disks and I run 4 VMs with 3 disks dedicated to each.  I happen to
use Physical RDM (disks are passed through to the VMs), but this was done more for convenience
(I can easily switch to native instances using the same storage).  Using virtual disks on
each physical disk should have negligible overhead.  The important part is to effectively
partition the physical resources (this includes processors and memory as well as disks) among
the VMs.  So if you happen to have 2 replicas on different VMs on the same host, you still
have protection against any one disk or VM failing.  I think this is similar to what you're
thinking with multiple DNs.


> -----Original Message-----
> From: Charles Earl [mailto:charlescearl@me.com]
> Sent: Wednesday, December 07, 2011 10:46 AM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: Moving disks from one datanode to another
> Jeff,
> Interested in how you approach the virtualization on hadoop issue.
> In particular, I would like to have a VM launched as an environment
> which could in essence mount the local data node's disk (or replica).
> For my application, the users in essence want the map task running in a
> given virtualized environment, but have the task run against HDFS
> store.
> Conceptually, it would seem that you would want each VM to have
> separate physically mounted disk?
> When I've used virtual disk this has shown 30% worse performance on
> write-oriented map than physical disk mount. This was with kvm with
> virtio, simple test with randomwriter.
> I wonder if you had any suggestions in that regard.
> I'm actually just now compiling & testing a vm based isolation module
> for the mesos (http://www.mesosproject.org/) in the hopes that this
> will address the need.
> The machine-as-rack paradigm seems quite interesting.
> Charles
> On Dec 7, 2011, at 1:21 PM, Jeffrey Buell wrote:
> >> I found a way:
> >>
> >> 1) Configure second datanode with a set of fresh empty directories.
> >> 2) Start second datanode, let it register with namenode.
> >> 3) Shut down first and second datanode, then move blk* and subdir
> dirs
> >> from data dirs of first node to data dirs of second datanode.
> >> 4) Start first and second datanode.
> >>
> >> This seems to work as intended. However, after some thinking I came
> to
> >> worry about the replication. HDFS will now consider the two datanode
> >> instances on the same host as two different hosts, which may cause
> >> replication to put two copies of the same file on the same host.
> >>
> >> It's probably not going to happen very often given that there's some
> >> randomness involved. And in my case there's always a third copy on
> >> another rack.
> >>
> >> Still, it's less than optimal. Are there any ways to fool HDFS into
> >> always placing all copies on different physical hosts in this rather
> >> messed up configuration?
> >>
> >> Thanks,
> >> \EF
> >
> >
> > This is the same issue as for running multiple virtual machines on
> each physical host.  I've found (on 0.20.2) that this gives
> consistently better performance than a single VM or a native OS
> instance (http://www.vmware.com/resources/techresources/10222), at
> least for I/O-intensive apps.  I'm still investigating why, but one
> possibility is that a datanode can't efficiently handle too many disks
> (I have either 10 or 12 per physical host).  So I'm very interested in
> seeing if multiple datanodes has a similar performance effect as
> multiple VMs (each with one DN).
> >
> > Back to replication:  Hadoop doesn't know that the machines it's
> running on might share a physical host, so there is a possibility that
> 2 copies end up on the same host.  What I'm doing now is define each
> host as a rack, so the second copy is guaranteed to go to a different
> host. I have a single physical rack.  I'm tempted to call physical
> racks "super racks" to distinguish them from logical racks.  A better
> scheme may be to divide the physical rack into 2 logical racks, so that
> most of the time the third copy goes on a different host than the
> second.  I think that is the best that can be done today.  Ideally we
> want to modify the block placement algorithm to recognize another level
> in the topology hierarchy for the multiple VM/DN case.  A simpler
> solution would be to add an option where the third copy is placed in a
> third rack when available (and extended to n replicas on n racks
> instead of random placement for n>3).  This would work for the single
> physical rack case with each host defined as a rack for the topology.
> Placing replicas on separate racks may be desirable for some
> conventional configurations also (e.g., ones with good inter-rack
> bandwidth).
> >
> > Jeff

View raw message