hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Re: Task type priorities during scheduling ?
Date Wed, 26 Jul 2006 20:21:59 GMT
Doug,

I agree that this isnt a high priority change, I'm just trying to start
discussion towards what is needed to make multijob work well.

I really like Yoram's suggestion of a single limit for map and reduce tasks.
Not charging the copy(shuffle) phase to that limit could be part of making
that work. Again, no urgency.

We are already running two parallel clusters on the same boxes, we call them
Blue (normal) and Yellow (nice'd), named after the colors on the Ganglia CPU
display. We run long jobs on the nice'd cluster, and short jobs at normal
priority.

It works really well. Kevin should be submitting the two patches we needed
to make it work.

Paul

On 7/25/06, Doug Cutting <cutting@apache.org> wrote:
>
> Paul Sutter wrote:
> > First, It matters in the case of concurrent jobs. If you submit a 20
> > minute job while a 20 hour job is running, it would be nice if the
> > reducers for the 20 minute job could get a chance to run before the 20
> > hour job's mappers have all finished. So even without a throughput
> > improvement, you have an important capability (although it may require
> > another minor tweak or two to make possible).
>
> I fear that more than a minor tweak or two are required to make
> concurrent jobs work well.  For example, you would also want to make
> sure that the long-running job does not consume all of the reduce slots,
> or the short job would again get stuck behind it.  Pausing long-running
> tasks might be required.
>
> The best way to do this at present is to run two job trackers, and two
> tasktrackers per node, then submit long-runnning jobs to one "cluster"
> and short-running jobs to the other.
>
> > Secondarily, we often have stragglers, where one mapper runs slower
> > than the others. When this happens, we end up with a largely idle
> > cluster for as long as an hour. In cases like these, good support for
> > concurrent jobs _would_ improve throughput.
>
> Can you perhaps increase the number of map tasks, so that even a slow
> task takes only a very small portion of the total execution time?
>
> Good support for concurrent jobs would be great to have, and I'd love to
> see a patch that addresses this issue comprehensively.  I am not
> convinced that it is worth making minor tweaks that may-or-may-not
> really help us to get there.
>
> Doug
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message