Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 97398 invoked from network); 26 Mar 2009 23:38:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Mar 2009 23:38:50 -0000 Received: (qmail 52901 invoked by uid 500); 26 Mar 2009 23:38:48 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 52787 invoked by uid 500); 26 Mar 2009 23:38:47 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 52777 invoked by uid 99); 26 Mar 2009 23:38:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Mar 2009 23:38:47 +0000 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Mar 2009 23:38:38 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1Lmz9N-0000UN-TF for core-user@hadoop.apache.org; Thu, 26 Mar 2009 16:38:17 -0700 Message-ID: <22733399.post@talk.nabble.com> Date: Thu, 26 Mar 2009 16:38:17 -0700 (PDT) From: Sid123 To: core-user@hadoop.apache.org Subject: How many nodes does one man want? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: itissid@gmail.com X-Virus-Checked: Checked by ClamAV on apache.org Hi, I am working of implementing some machine learning algorithms using Map Red. I want to know that If I have data that takes 5-6 hours to train on a normal machine. Will putting in 2-3 more nodes have an effect? I read in the yahoo hadoop tutorial. "Executing Hadoop on a limited amount of data on a small number of nodes may not demonstrate particularly stellar performance as the overhead involved in starting Hadoop programs is relatively high. Other parallel/distributed programming paradigms such as MPI (Message Passing Interface) may perform much better on two, four, or perhaps a dozen machines." I have at my disposal 3 laptops each with 4 G RAM and 150G hard disk space each... I have 600M of training data.... -- View this message in context: http://www.nabble.com/How-many-nodes-does-one-man-want--tp22733399p22733399.html Sent from the Hadoop core-user mailing list archive at Nabble.com.