hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Hadoop on a Virtualized O/S vs. the Real O/S
Date Tue, 09 Feb 2010 12:13:50 GMT
Stephen Watt wrote:
> Hi Folks
> I need to be able to certify that Hadoop works on various operating 
> systems. I do this by running a series it through a series of tests. As 
> I'm sure you can empathize, obtaining all the machines for each test run 
> can sometimes be tricky. It would be easier for me if I can spin up 
> several instances a virtual image of the desired O/S, but to do this, I 
> need to know if there are any risks I'm running using that approach.
> Is there any reason why Hadoop might work differently on a virtual O/S as 
> opposed to running on an actual O/S ? Since just about everything is done 
> through the JVM and SSH I don't foresee any issues and I don't believe 
> we're doing anything weird with device drivers or have any kernel module 
> dependencies.
> Kind regards
> Steve Watt

I run Hadoop on VMs

- performance can be below raw IO rates, but that's predictable
- if you bring up a private network then you have DNS/rDNS problems. 
Hadoop is happy if everything knows who it is and DNS does too. 
Otherwise: edit the hosts tables
- the big enemy on VMs is unexpected swapping out and clock drift, 
screws up anything that assumes time moves forward at roughly the same 
rate everywhere. Zookeeper assumes this, as do most distributed 
co-ordination systems. If you keep VM load low, one Virtual CPU per 
physical one, and don't overallocate physical memory, most of these 
problems go away
-set the CPU affinity for the VM so it is always bonded to the same CPU, 
using taskset or the equivalent. Minimises cache misses and other problems

View raw message