Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <77f4f8890904120839n34ee0ad6mfd81f38a66c7ddba@mail.gmail.com>
References: <77f4f8890904120839n34ee0ad6mfd81f38a66c7ddba@mail.gmail.com>
From: Aaron Kimball <aaron@cloudera.com>
Date: Sun, 12 Apr 2009 15:13:25 -0700
Message-ID: <d6d7c4410904121513i4ba83035od93d76cd4a29c7d3@mail.gmail.com>
Subject: Re: Map-Reduce Slow Down
To: core-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016368e1e61015491046762e902

--0016368e1e61015491046762e902
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Virtually none of the examples that ship with Hadoop are designed to
showcase its speed. Hadoop's speedup comes from its ability to process very
large volumes of data (starting around, say, tens of GB per job, and going
up in orders of magnitude from there). So if you are timing the pi
calculator (or something like that), its results won't necessarily be very
consistent. If a job doesn't have enough fragments of data to allocate one
per each node, some of the nodes will also just go unused.

The best example for you to run is to use randomwriter to fill up your
cluster with several GB of random data and then run the sort program. If
that doesn't scale up performance from 3 nodes to 15, then you've definitely
got something strange going on.

- Aaron


On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <mnagendr@asu.edu> wrote:

> Hey all
> I recently setup a three node hadoop cluster and ran an examples on it. It
> was pretty fast, and all the three nodes were being used (I checked the log
> files to make sure that the slaves are utilized).
>
> Now I ve setup another cluster consisting of 15 nodes. I ran the same
> example, but instead of speeding up, the map-reduce task seems to take
> forever! The slaves are not being used for some reason. This second cluster
> has a lower, per node processing power, but should that make any
> difference?
> How can I ensure that the data is being mapped to all the nodes? Presently,
> the only node that seems to be doing all the work is the Master node.
>
> Does 15 nodes in a cluster increase the network cost? What can I do to
> setup
> the cluster to function more efficiently?
>
> Thanks!
> Mithila Nagendra
> Arizona State University
>

--0016368e1e61015491046762e902--