hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Running tasks in the TaskTracker VM
Date Mon, 19 Mar 2007 18:13:31 GMT
Philippe Gassmann wrote:
> Yes, but the issue remains present if you have to deal with a high
> number of map tasks to distribute the load on many machines. Launching a
> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
> have to do 2000 map, there will be 2000 seconds lost in launching JVMs...

The InputFormat controls the number of map tasks.  So, if 2000 is too 
many, so that JVM startup time dominates, then you can develop an 
InputFormat that splits things into fewer tasks so that this is not a 
problem.

> A bit of refactoring of the TaskRunner hierarchy is needed for this to
> work : the code that launch tasks in the JVM or in a separate process is
> very similar and it would have a sense that the TaskRunner would be the
> superclass of a InJVMRunner and a ChildJVMRunner.
> But what can we do with MapTaskRunner and ReduceTaskRunner ? It is not
> acceptable to have let's say : 2 or more implementation of the
> MapTaskRunner (one for in a child JVM execution, one for a in tracker
> JVM execution...). It would be painful to maintain and very complicated.

Perhaps it is too complicated for now, but I think we will want 
something like that long-term, so it is worth thinking about.

Doug

Mime
View raw message