hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Gassmann <philippe.gassm...@anyware-tech.com>
Subject Running tasks in the TaskTracker VM
Date Mon, 19 Mar 2007 14:46:55 GMT
Hi Hadoop guys,

At the moment, for each task (map or reduce) a new JVM is created by the
TaskTracker to run the Job.

We have in our Hadoop cluster a high number of small files thus
requiring a high number of map tasks. I know this is suboptimal, but
aggregating those small files is not possible now. So an idea came to us
: launching jobs in the task tracker JVM so the overhead of creating a
new vm will disappear.

I already have a working patch against the 0.10.1 release of Hadoop that
launch tasks inside the TaskTracker JVM if a specific parameter is set
in the JobConf of the launched Job (for job we trust ;) ). Each new task
have a specific class loader which basically load every needed class by
the Task, as it was running in a brand new JVM. (the same "classpath" is

For that to work, an upgrade of commons-logging to the 1.1 version is
needed in order to circumvent class loader / memory leaks issues. I've
done some profiling using jprofiler on the task tracker to find and to
remove mem leaks. So I'm pretty confident with this code.

If you are interested with that, please let me know.
If so, I will provide a patch against the current Hadoop trunk in Jira
as soon as possible.


View raw message