hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Bieniosek <mich...@powerset.com>
Subject Re: Question about controlling the number of mapper tasks - 0.15.0
Date Fri, 30 Nov 2007 18:32:42 GMT
Ah, I see my knowledge is now out of date -- http://wiki.apache.org/lucene-hadoop/HowToConfigure


On 11/30/07 10:30 AM, "Michael Bieniosek" <michael@powerset.com> wrote:

The value in hadoop-site.xml overrides the value set programmatically.

You can set a value for maptasks/reducetasks in mapred-default.xml instead of hadoop-site.xml
-- this value will serve as a default that can be overridden programmatically.  However, mapred-default.xml
is due to be eliminated in 0.16, and I am not sure what the recommended way now is.


On 11/30/07 12:00 AM, "Jason Venner" <jason@attributor.com> wrote:

We have several 8 processor machines in our cluster, and for most of our
mapper tasks we would like to spawn 8 per machine.

We have 1 mapper task that is extremely resource intensive and we can
only spawn 1.

We do have multiple arms for our DFS, so we would like to run multiple
reduce jobs on each machine.

We have had little luck changing these parameters by setting the numbers
via JobConf
jobConf.setNumMapTasks(int n)
jobConf.setNumReduceTasks(int n)

What we have ended up doing is reconfiguring the cluster by changing the
hadoop-site.xml between the different runs, which is awkward.

Have we just fumble fingered it, or is there a way, that we are missing
to set the concurrency for mappers and reducers, on a per job basis?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message