hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum
Date Fri, 10 Feb 2012 12:42:00 GMT
Hi Rob,

On Fri, Feb 10, 2012 at 5:55 PM, Rob Stewart <robstewart57@gmail.com> wrote:
> I'm looking to clarify the relationship between
> MultithreadedMapper.setNumberOfThreads(i) and
> mapreduce.tasktracker.map.tasks.maximum .

The former is an in-user-application value that controls the total
number of threads to run for map() calls (inside a mapper). This is
_inside_ one JVM (a task, in hadoop terms, is one complete JVM running
user code).

The latter controls, at a TaskTracker level, the max total number of
map-task JVMs that it can run concurrently at any given time.

> What about if I set:
> - MultithreadedMapper.setNumberOfThreads( 4 )
> - mapreduce.tasktracker.map.tasks.maximum = 4
> Will this mean that 4 map tasks are executed in 4 threads in one JVM,
> or will it mean that 4 JVMs be instantiated, each executing 4 map
> tasks in individual threads?

4 JVMs if you have 4 tasks in your Job  (# of map tasks of a job is
dependent on its input).

Each JVM will then run the MultithreadedMapper code, which will then
run 4 threads to call your map() inside of it cause you've asked that
of it.

Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

View raw message