Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 44679 invoked from network); 6 Jun 2008 16:41:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Jun 2008 16:41:52 -0000 Received: (qmail 17087 invoked by uid 500); 6 Jun 2008 16:41:51 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 17052 invoked by uid 500); 6 Jun 2008 16:41:51 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 17041 invoked by uid 99); 6 Jun 2008 16:41:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2008 09:41:51 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com designates 66.249.92.168 as permitted sender) Received: from [66.249.92.168] (HELO ug-out-1314.google.com) (66.249.92.168) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2008 16:41:02 +0000 Received: by ug-out-1314.google.com with SMTP id u2so743692uge.5 for ; Fri, 06 Jun 2008 09:41:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=YaTZ+XjXloM3gFXk+4//MXAyS20d3CIwRvQEyYYsQyM=; b=XJXE1O+IMMkvaQAJvUpB6J+FNQzhnxzqmsFsc5wpM8ZboQZ3S6CRG29oC7CfRs9ll2 e1oDMkixWdSMxFThaLXxJtrO5Z8ch1tXp9Ox9vSDrfcmRYp0kRFwePOyhSegBCICvIzA lYeZHeAzJXo/3AmOYGsKBnveqHE9oOgoJu76g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=MArBZWFr3UrrlCfUcpZIv/BBh04+PRmNC7URwGn6VLv0wfmFwMsRzIbz2SBEAcX0ns y1i++ilq227TEADjQOp5aSA4rIouIxOSIzmDQVzEk4BCD+6IY136WSrWTtdBUXDGtlQD L/9ZSnPh3FgdAYerMKWHm6mG9F/3c9BsnCcWM= Received: by 10.210.133.19 with SMTP id g19mr234731ebd.83.1212770478636; Fri, 06 Jun 2008 09:41:18 -0700 (PDT) Received: by 10.210.115.8 with HTTP; Fri, 6 Jun 2008 09:41:18 -0700 (PDT) Message-ID: Date: Fri, 6 Jun 2008 12:41:18 -0400 From: "Edward Capriolo" To: core-user@hadoop.apache.org Subject: Re: Hadoop Distributed Virtualisation In-Reply-To: <2d2102ba0806060919w42e485b8t16ed836fbf040ab7@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2d2102ba0806060730o3b2a7eb3m68886e5cf5973480@mail.gmail.com> <2d2102ba0806060919w42e485b8t16ed836fbf040ab7@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org I once asked a wise man in change of a rather large multi-datacenter service, "Have you every considered virtualization?" He replied, "All the CPU's here are pegged at 100%" They may be applications for this type of processing. I have thought about systems like this from time to time. This thinking goes in circles. Hadoop is designed for storing and processing on different hardware. Virtualization lets you split a system into sub-systems. Virtualization is great for proof of concept. For example, I have deployed this: I installed VMware with two linux systems on my windows host, I followed a hadoop multi-system-tutorial running on two vmware nodes. I was able to get the word count application working, I also confirmed that blocks were indeed being stored on both virtual systems and that processing was being shared via MAP/REDUCE. The processing however was slow, of course this is the fault of VMware. VMware has a very high emulation overhead. Xen has less overhead. LinuxVserver and OpenVZ use software virtualization (they have very little (almost no) overhead). Regardless of how much overhead, overhead is overhead. Personally I find the Vmware falls short of its promises