hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Running tasks in the TaskTracker VM
Date Mon, 19 Mar 2007 16:54:58 GMT
Philippe Gassmann wrote:
> At the moment, for each task (map or reduce) a new JVM is created by the
> TaskTracker to run the Job.
> We have in our Hadoop cluster a high number of small files thus
> requiring a high number of map tasks. I know this is suboptimal, but
> aggregating those small files is not possible now. So an idea came to us
> : launching jobs in the task tracker JVM so the overhead of creating a
> new vm will disappear.

A simpler approach might be to develop an InputFormat that includes 
multiple files per split.

> I already have a working patch against the 0.10.1 release of Hadoop that
> launch tasks inside the TaskTracker JVM if a specific parameter is set
> in the JobConf of the launched Job (for job we trust ;) ).

Ideally this could be through a task-running interface, that permits one 
to plug in different implementations.  For example, sometimes it may 
make sense to run tasks in-process, sometimes to run them in a child 
JVM, and sometimes to fork a non-Java sub-process.  So, rather than 
specifying a flag on the job, one would specify the runner 
implementation class.


View raw message