hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylv...@apache.org>
Subject Re: Running tasks in the TaskTracker VM
Date Tue, 20 Mar 2007 16:01:33 GMT
Stephane Bailliez wrote:
> Torsten Curdt wrote:
>>
>>> Being a complete idiot for distributed computing, I would say it is
>>> easy to explode a JVM when doing such distributed jobs, (should it
>>> be for OOM or anything).
>>
>> Then restrict what people can do - at least Google went that route.
>
> I don't know what Google did on the specifics :)

They came up with their own language for mapreduce jobs:
http://labs.google.com/papers/sawzall.html

> If you want to do that with Java and restrict memory usage, cpu usage
> and descriptor access within each inVM instance. That's a considerable
> amount of work that likely implies writing a specific agent for the vm
> (or an agent for a specific vm that is, because it's pretty unlikely
> that you will get the same results across vms), assuming that can then
> really be done at the classloader level for each task (which is pretty
> insanely complex to me if you have to consider allocation done at the
> parent classloader level, etc..)
>
> At least by forking a vm you can afford to get some reasonably bound
> control over the resources usage (or at least memory) without bringing
> down everything since a vm is already bound to some degrees.
>
>
>>> Failing jobs are not exactly uncommon and running things in a
>>> sandboxed environment with less risk for the tracker seems like a
>>> perfectly reasonable choice. So yeah, vm pooling certainly makes
>>> perfect sense for it
>>
>> I am still not convinced - sorry
>>
>> It's a bit like you would like to run JSPs in a separate JVM because
>> they might take down the servlet container.
>
> it is  a bit too extreme in granularity. I think it is more about like
> running n different webapps within the same VM or not. So if one
> webapp is resource hog, separating it would not harm the n-1 other
> applications and you would either create another server instance or
> move it away to another node.
>
> I know of environment with large number of nodes (not related to
> hadoop) where they also reboot a set of nodes daily to ensure that all
> machines are really in working conditions (it's usually when the
> machine reboots due to failure or whatever that someone has to rush to
> it because some service forgot to be registered or things like that,
> so doing this periodic check gives some people better ideas of their
> response time to failure). That depends of operational procedures for
> sure.

This can be another implementation of the TaskTracker: a single JVM that
forks a "replacement JVM" after either a given time or a given amount of
tasks executed. This can avoid JVM fork overhead while also avoiding
memory leak problems.

The forked JVM could even be pre-forked and monitor the active one,
taking over if it no more responds (and eventually killing it).

Sylvain

-- 
Sylvain Wallez - http://bluxte.net


Mime
View raw message