Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C90618E59 for ; Fri, 9 Sep 2011 18:04:17 +0000 (UTC) Received: (qmail 67085 invoked by uid 500); 9 Sep 2011 18:04:17 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 66871 invoked by uid 500); 9 Sep 2011 18:04:16 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Delivered-To: moderator for mapreduce-dev@hadoop.apache.org Received: (qmail 44519 invoked by uid 99); 9 Sep 2011 17:46:07 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of msegel@navteq.com designates 204.120.70.37 as permitted sender) From: "Segel, Mike" To: "common-dev@hadoop.apache.org" , "mapreduce-dev@hadoop.apache.org" Date: Fri, 9 Sep 2011 12:45:38 -0500 Subject: RE: Research projects for hadoop Thread-Topic: Research projects for hadoop Thread-Index: AcxvCWnYp6JCtHvxQICR51HMYpeCYQADqGyA Message-ID: <3798688F8784154192FDA3388F4C2081019BFA83@hq-ex-mb03.ad.navteq.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Why would you want to take a perfectly good machine and then try to virtual= ize it? I mean if I have 4 quad core cpus, I can run a lot of simultaneous map task= s. However if I virtualize the box, I lose at least 1 core per VM so I end = up with 4 nodes that have less capabilities and performance than I would ha= ve under my original box.... -----Original Message----- From: Saikat Kanjilal [mailto:sxk1969@hotmail.com] Sent: Friday, September 09, 2011 10:59 AM To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org Subject: Research projects for hadoop Hi Folks,I was looking through the following wiki page: http://wiki.apach= e.org/hadoop/HadoopResearchProjects and was wondering if there's been any w= ork done (or any interest to do work) for the following topics: Integration of Virtualization (such as Xen) with Hadoop toolsHow does one i= ntegrate sandboxing of arbitrary user code in C++ and other languages in a = VM such as Xen with the Hadoop framework? How does this interact with SGE, = Torque, Condor?As each individual machine has more and more cores/cpus, it = makes sense to partition each machine into multiple virtual machines. That = gives us a number of benefits:By assigning a virtual machine to a datanode,= we effectively isolate the datanode from the load on the machine caused by= other processes, making the datanode more responsive/reliable.With multipl= e virtual machines on each machine, we can lower the granularity of hod sch= eduling units, making it possible to schedule multiple tasktrackers on the = same machine, improving the overall utilization of the whole clusters.With = virtualization, we can easily snapshot a virtual cluster before releasing i= t, making it possible to re-activate the same cluster in the future and sta= rt to work from the snapshot.Provisioning of long running Services via HODW= ork on a computation model for services on the grid. The model would includ= e:Various tools for defining clients and servers of the service, and at the= least a C++ and Java instantiation of the abstractionsLogical definitions = of how to partition work onto a set of servers, i.e. a generalized shard im= plementationA few useful abstractions like locks (exclusive and RW, fairnes= s), leader election, transactions,Various communication models for groups o= f servers belonging to a service, such as broadcast, unicast, etc.Tools for= assuring QoS, reliability, managing pools of servers for a service with sp= ares, etc.Integration with HDFS for persistence, as well as access to local= filesystemsIntegration with ZooKeeper so that applications can use the nam= espace I would like to either help out with a design for the above or proto= typing code, please let me know if and what the process may be to move forw= ard with this. Regards The information contained in this communication may be CONFIDENTIAL and is = intended only for the use of the recipient(s) named above. If you are not = the intended recipient, you are hereby notified that any dissemination, dis= tribution, or copying of this communication, or any of its contents, is str= ictly prohibited. If you have received this communication in error, please= notify the sender and delete/destroy the original message and any copy of = it from your computer or paper files.