hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Freas" <colinfr...@gmail.com>
Subject Re: Hadoop Distributed Virtualisation
Date Fri, 06 Jun 2008 17:03:20 GMT
The MR jobs I'm performing are not CPU intensive, so I've always assumed
that they're more IO bound.  Maybe that's an exceptional situation, but I'm
not really sure.

A good motherboard with a local IO channel per disk, feeding individual
cores, with memory partitioned up between them...  and I've heard good
things about Intel's next tock vis-a-vis internal system throughput.

And yes, this would be a task for a paravirtualization system like Xen.
Again, it's just a thought, but with low end quad core proc's running about
$300, and the potential to cut the number of machines you need to physically
setup by 75%, I'm not sure I'd say it'd only be good for a proof of
concept.

Also, I just set up a dozen odd boxes that are two generations behind modern
boxes, and promptly blew a fuse.  The TDP on the Xeon 3.06Ghz chips I'm
using is 89W.  The TDP on an Intel Q6600 is 65W, and it represents 4 cores.

It's a simple experiment, but I don't have the resources on hand to run it.
I'm curious if anyone has seen the performance impact from the different
setups we're talking about.  I also think you could come close to faking it
with Hadoop config changes.

-Colin


On Fri, Jun 6, 2008 at 12:41 PM, Edward Capriolo <edlinuxguru@gmail.com>
wrote:

> I once asked a wise man in change of a rather large multi-datacenter
> service, "Have you every considered virtualization?" He replied, "All
> the CPU's here are pegged at 100%"
>
> They may be applications for this type of processing. I have thought
> about systems like this from time to time. This thinking goes in
> circles. Hadoop is designed for storing and processing on different
> hardware.  Virtualization lets you split a system into sub-systems.
>
> Virtualization is great for proof of concept.
> For example, I have deployed this: I installed VMware with two linux
> systems on my windows host, I followed a hadoop multi-system-tutorial
> running on two vmware nodes. I was able to get the word count
> application working, I also confirmed that blocks were indeed being
> stored on both virtual systems and that processing was being shared
> via MAP/REDUCE.
>
> The processing however was slow, of course this is the fault of
> VMware. VMware has a very high emulation overhead. Xen has less
> overhead. LinuxVserver and OpenVZ use software virtualization (they
> have very little (almost no) overhead). Regardless of how much
> overhead, overhead is overhead. Personally I find the Vmware falls
> short of its promises
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message