Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 54971 invoked from network); 6 Jun 2008 17:03:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Jun 2008 17:03:52 -0000 Received: (qmail 43118 invoked by uid 500); 6 Jun 2008 17:03:51 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 43090 invoked by uid 500); 6 Jun 2008 17:03:51 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 43078 invoked by uid 99); 6 Jun 2008 17:03:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2008 10:03:51 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of colinfreas@gmail.com designates 209.85.200.173 as permitted sender) Received: from [209.85.200.173] (HELO wf-out-1314.google.com) (209.85.200.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2008 17:03:02 +0000 Received: by wf-out-1314.google.com with SMTP id 24so1057969wfg.2 for ; Fri, 06 Jun 2008 10:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=P8hN0iqetGjeOhZxkZU4KTslfr/9GyNXq6sQvdF3S5M=; b=lgokFv6cSLbDKv/6BGXS+KjM6e6OBD0h2+nHvWzBSBQ6G9Kd3nO+c4CcGDfvlzMgWF BgykaYe9DFfL1dUr6PnIXWhAVHieUBUQn8s/uR2vtu+h+whmhsipbA85iAqDoE8cpVtQ IPC5TztjXftYQin+Q99mcyLeLO/2WQ6arKnlI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=ack3P20tST+of+E0QaqtRZ/qEsD2ul5a72+mzRmNUmOGspxePcWHcFy4PXElRbA/UK /hwZR/z1Ntvwr8/ZzTrKyTqRFJlqPvdA/8vVtoatvRRXiJUu7uMrxyDYhchSKjL4aPj9 KNW7UpR+jzbpbYob8cTzXfqeX4oLddgOkOVAo= Received: by 10.142.207.8 with SMTP id e8mr96787wfg.281.1212771800072; Fri, 06 Jun 2008 10:03:20 -0700 (PDT) Received: by 10.143.39.3 with HTTP; Fri, 6 Jun 2008 10:03:20 -0700 (PDT) Message-ID: Date: Fri, 6 Jun 2008 13:03:20 -0400 From: "Colin Freas" To: core-user@hadoop.apache.org Subject: Re: Hadoop Distributed Virtualisation In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_4807_802868.1212771800068" References: <2d2102ba0806060730o3b2a7eb3m68886e5cf5973480@mail.gmail.com> <2d2102ba0806060919w42e485b8t16ed836fbf040ab7@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_4807_802868.1212771800068 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline The MR jobs I'm performing are not CPU intensive, so I've always assumed that they're more IO bound. Maybe that's an exceptional situation, but I'm not really sure. A good motherboard with a local IO channel per disk, feeding individual cores, with memory partitioned up between them... and I've heard good things about Intel's next tock vis-a-vis internal system throughput. And yes, this would be a task for a paravirtualization system like Xen. Again, it's just a thought, but with low end quad core proc's running about $300, and the potential to cut the number of machines you need to physically setup by 75%, I'm not sure I'd say it'd only be good for a proof of concept. Also, I just set up a dozen odd boxes that are two generations behind modern boxes, and promptly blew a fuse. The TDP on the Xeon 3.06Ghz chips I'm using is 89W. The TDP on an Intel Q6600 is 65W, and it represents 4 cores. It's a simple experiment, but I don't have the resources on hand to run it. I'm curious if anyone has seen the performance impact from the different setups we're talking about. I also think you could come close to faking it with Hadoop config changes. -Colin On Fri, Jun 6, 2008 at 12:41 PM, Edward Capriolo wrote: > I once asked a wise man in change of a rather large multi-datacenter > service, "Have you every considered virtualization?" He replied, "All > the CPU's here are pegged at 100%" > > They may be applications for this type of processing. I have thought > about systems like this from time to time. This thinking goes in > circles. Hadoop is designed for storing and processing on different > hardware. Virtualization lets you split a system into sub-systems. > > Virtualization is great for proof of concept. > For example, I have deployed this: I installed VMware with two linux > systems on my windows host, I followed a hadoop multi-system-tutorial > running on two vmware nodes. I was able to get the word count > application working, I also confirmed that blocks were indeed being > stored on both virtual systems and that processing was being shared > via MAP/REDUCE. > > The processing however was slow, of course this is the fault of > VMware. VMware has a very high emulation overhead. Xen has less > overhead. LinuxVserver and OpenVZ use software virtualization (they > have very little (almost no) overhead). Regardless of how much > overhead, overhead is overhead. Personally I find the Vmware falls > short of its promises > ------=_Part_4807_802868.1212771800068--