hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Can you run multiple simultaneous hadoop jobs?
Date Fri, 23 May 2008 16:43:51 GMT


Looks like a nice piece of work.  I just spent 20 minutes looking back for
the old bug HADOOP-2573 only to find that you already knew about that and
had addressed it.

This could really help people make progress on improving the scheduler and
that progress would really improve the usability of large clusters with jobs
that vary a lot in importance and size.

And your English is just fine.  Completely understandable.

On 5/23/08 12:53 AM, "Brice Arnould" <brice@vleu.net> wrote:

> Kayla Jay a écrit :
>> I'm trying to figure out why I need to use HOD vs. trying to run multiple
>> jobs at the same time on the same set of resources.  Is it possible to run
>> multiple hadoop jobs at the same time on the same set of input data?
>> I tried to run different jobs on the same set of data at the same time,
>> but it takes a while (way while) and almost appears as if it queues up
>> and the next job has to wait and so forth before completing.
>> So, I tried moving onto HOD.  It's not very apparent why one would want
>> to use HOD to run on different nodes at the same time for different
>> jobs that access the same set of input data.
>> Can anyone provide any feedback on running multiple jobs at the same
>> time on the same set of data?  HOD use?  Why would I have to run HOD
>> and schedule running multiple jobs at the same time on the same
>> set of data, but within their own set of resources in the cluster?
> Hi !
> I just contributed a new implementation of the scheduler that adds an
> option called "mapred.jobtracker.scheduler.maxRunningTasksPerJob"
> allowing you to limit the number of nodes allocated to a Job (and so not
> to use HOD).
> This limit is a hint and if some nodes have nothing to do, they will be
> allocated anyway.
> If you want to test it, the patch is available in the bug #3412
> http://issues.apache.org/jira/browse/HADOOP-3412
> It applies on TRUNK but I can make a few modifications if you want it to
> apply on a release.
> Ant jar should be sufficient to build it, but please ask me if you have
> more question.
> I would really appreciate your feedback about the behavior of that
> scheduler. I'm trying to solve precisely those problem resulting of
> partitioned clusters, and I'll try to do something that suit better to
> your needs if you can tell me more.
> Brice
> PS: Please excuse me for my English :-P

View raw message