hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Research projects for hadoop
Date Fri, 09 Sep 2011 18:24:27 GMT
Both Hadoop and virtualization are means to an end. That end is to consolidate workloads traditionally
deployed to separate servers so the average utilization and ROI of a given server increases.

Companies looking to consolidate data-intensive computation may be better served moving to
Hadoop infrastructure than a virtualization project. Let me give you an example:

> From: Saikat Kanjilal [mailto:sxk1969@hotmail.com]
> By assigning a virtual machine to a datanode, we effectively isolate 
> the datanode from the load on the machine caused by other processes, making the 
> datanode more responsive/reliable.W


One can set up virtual partitions of CPU and RAM resources that can be fairly independent,
but attempting to stack I/O intensive workloads on top of each other via virtualization is
a recipe for lower performance, negative ROI, and dissatisfied users.

Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: "Segel, Mike" <msegel@navteq.com>
> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org"
<mapreduce-dev@hadoop.apache.org>
> Cc: 
> Sent: Friday, September 9, 2011 10:45 AM
> Subject: RE: Research projects for hadoop
> 
> Why would you want to take a perfectly good machine and then try to virtualize 
> it?
> I mean if I have 4 quad core cpus, I can run a lot of simultaneous map tasks. 
> However if I virtualize the box, I lose at least 1 core per VM so I end up with 
> 4 nodes that have less capabilities and performance than I would have under my 
> original box....
> 
> 
> -----Original Message-----
> From: Saikat Kanjilal [mailto:sxk1969@hotmail.com]
> Sent: Friday, September 09, 2011 10:59 AM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: Research projects for hadoop
> 
> 
> Hi  Folks,I was looking through the following wiki page:  
> http://wiki.apache.org/hadoop/HadoopResearchProjects and was wondering if 
> there's been any work done (or any interest to do work) for the following 
> topics:
> Integration of Virtualization (such as Xen) with Hadoop toolsHow does one 
> integrate sandboxing of arbitrary user code in C++ and other languages in a VM 
> such as Xen with the Hadoop framework? How does this interact with SGE, Torque, 
> Condor?As each individual machine has more and more cores/cpus, it makes sense 
> to partition each machine into multiple virtual machines. That gives us a number 
> of benefits:By assigning a virtual machine to a datanode, we effectively isolate 
> the datanode from the load on the machine caused by other processes, making the 
> datanode more responsive/reliable.With multiple virtual machines on each 
> machine, we can lower the granularity of hod scheduling units, making it 
> possible to schedule multiple tasktrackers on the same machine, improving the 
> overall utilization of the whole clusters.With virtualization, we can easily 
> snapshot a virtual cluster before releasing it, making it possible to 
> re-activate the same cluster in the future and start to work from the 
> snapshot.Provisioning of long running Services via HODWork on a computation 
> model for services on the grid. The model would include:Various tools for 
> defining clients and servers of the service, and at the least a C++ and Java 
> instantiation of the abstractionsLogical definitions of how to partition work 
> onto a set of servers, i.e. a generalized shard implementationA few useful 
> abstractions like locks (exclusive and RW, fairness), leader election, 
> transactions,Various communication models for groups of servers belonging to a 
> service, such as broadcast, unicast, etc.Tools for assuring QoS, reliability, 
> managing pools of servers for a service with spares, etc.Integration with HDFS 
> for persistence, as well as access to local filesystemsIntegration with 
> ZooKeeper so that applications can use the namespace I would like to either help 
> out with a design for the above or prototyping code, please let me know if and 
> what the process may be to move forward with this.
> Regards
> 
> The information contained in this communication may be CONFIDENTIAL and is 
> intended only for the use of the recipient(s) named above.  If you are not the 
> intended recipient, you are hereby notified that any dissemination, 
> distribution, or copying of this communication, or any of its contents, is 
> strictly prohibited.  If you have received this communication in error, please 
> notify the sender and delete/destroy the original message and any copy of it 
> from your computer or paper files.
>

Mime
View raw message