hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: How can I reduce the number of nodes used by a job
Date Fri, 16 Dec 2011 00:08:43 GMT
Hi Steve, there is no simple way to just limit the number of nodes as it
would involve moving the data:  You want to have the 3 replicas on the
5,10,20 nodes, correct?

You could potentially just stop the TTs on the extra nodes, but your job(s)
will likely have to fetch the data from remote nodes and will run slower
than it/they actually would in the corresponding cluster.  Shutting down
the DNs will cause unnecessary replication and redistribution of data
(unless your data are small and you can afford to reload the data or to
reformat the HDFS each time).

Moving the computations to data is a big part of MR and by restricting the
job to a subset of nodes one is likely to skew the results.

Dr. Alex Kozlov

On Thu, Dec 15, 2011 at 2:03 PM, Steve Lewis <lordjoe2000@gmail.com> wrote:

> I am reporting on performance of a hadoop task on a cluster with about 50
> nodes. I would like to be able to report performance on clusters of 5,10,20
> nodes without
> changing int current cluster. Is there a way to limit the number of nodes
> used by a job and if so how?
> --
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com

View raw message