Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 1247 invoked from network); 12 Apr 2009 22:14:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Apr 2009 22:14:13 -0000 Received: (qmail 83055 invoked by uid 500); 12 Apr 2009 22:14:10 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 82949 invoked by uid 500); 12 Apr 2009 22:14:10 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 82939 invoked by uid 99); 12 Apr 2009 22:14:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Apr 2009 22:14:10 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.132.250] (HELO an-out-0708.google.com) (209.85.132.250) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Apr 2009 22:14:02 +0000 Received: by an-out-0708.google.com with SMTP id c38so1096532ana.29 for ; Sun, 12 Apr 2009 15:13:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.101.71.6 with SMTP id y6mr6389744ank.74.1239574420476; Sun, 12 Apr 2009 15:13:40 -0700 (PDT) In-Reply-To: <77f4f8890904120839n34ee0ad6mfd81f38a66c7ddba@mail.gmail.com> References: <77f4f8890904120839n34ee0ad6mfd81f38a66c7ddba@mail.gmail.com> From: Aaron Kimball Date: Sun, 12 Apr 2009 15:13:25 -0700 Message-ID: Subject: Re: Map-Reduce Slow Down To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016368e1e61015491046762e902 X-Virus-Checked: Checked by ClamAV on apache.org --0016368e1e61015491046762e902 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Virtually none of the examples that ship with Hadoop are designed to showcase its speed. Hadoop's speedup comes from its ability to process very large volumes of data (starting around, say, tens of GB per job, and going up in orders of magnitude from there). So if you are timing the pi calculator (or something like that), its results won't necessarily be very consistent. If a job doesn't have enough fragments of data to allocate one per each node, some of the nodes will also just go unused. The best example for you to run is to use randomwriter to fill up your cluster with several GB of random data and then run the sort program. If that doesn't scale up performance from 3 nodes to 15, then you've definitely got something strange going on. - Aaron On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra wrote: > Hey all > I recently setup a three node hadoop cluster and ran an examples on it. It > was pretty fast, and all the three nodes were being used (I checked the log > files to make sure that the slaves are utilized). > > Now I ve setup another cluster consisting of 15 nodes. I ran the same > example, but instead of speeding up, the map-reduce task seems to take > forever! The slaves are not being used for some reason. This second cluster > has a lower, per node processing power, but should that make any > difference? > How can I ensure that the data is being mapped to all the nodes? Presently, > the only node that seems to be doing all the work is the Master node. > > Does 15 nodes in a cluster increase the network cost? What can I do to > setup > the cluster to function more efficiently? > > Thanks! > Mithila Nagendra > Arizona State University > --0016368e1e61015491046762e902--