Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: pass (asf.osuosl.org: domain of sutter@gmail.com designates
 64.233.162.207 as permitted sender)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references;
        b=jTIjRLz/wBC4k3qMXdVmA53JXqzn63jTa3eiRbnBZABxjtPEB/i2sfX5uk7daargz1MJ7liQ5PlDV7MuqlNoPxmzb9YXt1hVzsN5ifa/oACIjqikCJ4L/p3MbyL4r1v8IEQVTpQivmIwwzkbzyKcp8cXCIWThxwFo0rQ2X5kjjk=
Message-ID: <e1d10fc00607261321m61ca20f2gb806403f8632b5ec@mail.gmail.com>
Date: Wed, 26 Jul 2006 13:21:59 -0700
From: "Paul Sutter" <sutter@gmail.com>
To: hadoop-user@lucene.apache.org
Subject: Re: Task type priorities during scheduling ?
In-Reply-To: <44C708F5.8080602@apache.org>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_12373_6883118.1153945319699"
References: 
 <358D735BD7AB45429F2B1C14F38E10F70465C729@DEN-EXM-03.corp.ebay.com>
	 <007501c6ac3c$8dc7bad0$a248480a@ds.corp.yahoo.com>
	 <e1d10fc00607201757k4ea9d879t694467f63d9e8712@mail.gmail.com>
	 <80CCE470-BF9D-4AC6-9B76-F55EE0E3E31B@yahoo-inc.com>
	 <44C4801F.5010903@apache.org>
	 <e1d10fc00607240328p4936d397vea380755e259d2ac@mail.gmail.com>
	 <44C5D507.9080203@apache.org>
	 <e1d10fc00607251101h116fb2cdn6692fac03a643c4@mail.gmail.com>
	 <44C708F5.8080602@apache.org>

------=_Part_12373_6883118.1153945319699
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Doug,

I agree that this isnt a high priority change, I'm just trying to start
discussion towards what is needed to make multijob work well.

I really like Yoram's suggestion of a single limit for map and reduce tasks.
Not charging the copy(shuffle) phase to that limit could be part of making
that work. Again, no urgency.

We are already running two parallel clusters on the same boxes, we call them
Blue (normal) and Yellow (nice'd), named after the colors on the Ganglia CPU
display. We run long jobs on the nice'd cluster, and short jobs at normal
priority.

It works really well. Kevin should be submitting the two patches we needed
to make it work.

Paul

On 7/25/06, Doug Cutting <cutting@apache.org> wrote:
>
> Paul Sutter wrote:
> > First, It matters in the case of concurrent jobs. If you submit a 20
> > minute job while a 20 hour job is running, it would be nice if the
> > reducers for the 20 minute job could get a chance to run before the 20
> > hour job's mappers have all finished. So even without a throughput
> > improvement, you have an important capability (although it may require
> > another minor tweak or two to make possible).
>
> I fear that more than a minor tweak or two are required to make
> concurrent jobs work well.  For example, you would also want to make
> sure that the long-running job does not consume all of the reduce slots,
> or the short job would again get stuck behind it.  Pausing long-running
> tasks might be required.
>
> The best way to do this at present is to run two job trackers, and two
> tasktrackers per node, then submit long-runnning jobs to one "cluster"
> and short-running jobs to the other.
>
> > Secondarily, we often have stragglers, where one mapper runs slower
> > than the others. When this happens, we end up with a largely idle
> > cluster for as long as an hour. In cases like these, good support for
> > concurrent jobs _would_ improve throughput.
>
> Can you perhaps increase the number of map tasks, so that even a slow
> task takes only a very small portion of the total execution time?
>
> Good support for concurrent jobs would be great to have, and I'd love to
> see a patch that addresses this issue comprehensively.  I am not
> convinced that it is worth making minor tweaks that may-or-may-not
> really help us to get there.
>
> Doug
>

------=_Part_12373_6883118.1153945319699--