hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christophe Taton" <ta...@apache.org>
Subject Re: Realtime Map Reduce = Supercomputing for the Masses?
Date Mon, 02 Jun 2008 12:12:22 GMT
Hi Steve,

On Mon, Jun 2, 2008 at 12:23 PM, Steve Loughran <stevel@apache.org> wrote:

> Christophe Taton wrote:
>> Actually Hadoop could be made more friendly to such realtime Map/Reduce
>> jobs.
>> For instance, we could consider running all tasks inside the task tracker
>> jvm as separate threads, which could be implemented as another personality
>> of the TaskRunner.
>> I have been looking into this a couple of weeks ago...
>> Would you be interested in such a feature?
> Why does that have benefits? So that you can share stuff via local data
> structures? Because you'd better be sharing classloaders if you are going to
> play that game. And that is very hard to get right (to the extent that I
> dont think any apache project other than Felix does it well)

The most obvious improvement to my mind concerns the memory footprint of the
infrastructure. Running jobs leads to at least 3 jvms per machine (the data
node, the task tracker and the task), if you forget parallelism and accept
to run only one task per node at a time. This is problematic if you have
machines with low memory capacities.

That said, I agree with your concerns about classloading.
I have actually been thinking that we might try to rely on osgi to do the
job, and package hadoop daemons, jobs and tasks as osgi bundles and
services; but I faced many tricky issues in doing that (the last one being
the resolution of configuration files by the classloaders).
To my mind, one short term and minimal way of achieving this would be to use
a URLClassLoader in conjunction with the hdfs URLStreamHandler, to let the
task tracker run tasks directly...

Christophe T.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message