Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 08BCD200C47 for ; Thu, 30 Mar 2017 18:16:34 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0743D160B8B; Thu, 30 Mar 2017 16:16:34 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4C488160B78 for ; Thu, 30 Mar 2017 18:16:33 +0200 (CEST) Received: (qmail 79325 invoked by uid 500); 30 Mar 2017 16:16:31 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 79314 invoked by uid 99); 30 Mar 2017 16:16:31 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Mar 2017 16:16:31 +0000 Received: from mail-qt0-f177.google.com (mail-qt0-f177.google.com [209.85.216.177]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id C24D31A0323 for ; Thu, 30 Mar 2017 16:16:30 +0000 (UTC) Received: by mail-qt0-f177.google.com with SMTP id x35so43469140qtc.2 for ; Thu, 30 Mar 2017 09:16:30 -0700 (PDT) X-Gm-Message-State: AFeK/H2WIIGqAO4L4ED4uykXYlR6lh4MYVtadOtIMF8ZSmpdEmAd6t1ym8rMoPeXvz+8BSf0J79arnjH+Yg5Ew== X-Received: by 10.200.50.199 with SMTP id a7mr545141qtb.95.1490890589794; Thu, 30 Mar 2017 09:16:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.89.18 with HTTP; Thu, 30 Mar 2017 09:16:29 -0700 (PDT) X-Originating-IP: [73.71.150.200] In-Reply-To: <89A3E0D8-4D21-4EEC-B7BA-7AA023441B68@chartbeat.com> References: <89A3E0D8-4D21-4EEC-B7BA-7AA023441B68@chartbeat.com> From: David McLaughlin Date: Thu, 30 Mar 2017 09:16:29 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: schedule task instances spreading them based on a host attribute. To: dev@aurora.apache.org Content-Type: multipart/alternative; boundary=001a11405cdc014d8b054bf5048c archived-at: Thu, 30 Mar 2017 16:16:34 -0000 --001a11405cdc014d8b054bf5048c Content-Type: text/plain; charset=UTF-8 I think this is more complicated than multiple scheduling algorithms. The problem you'll end up having if you try to solve this in the Scheduling loop is when resources are unavailable because there are preemptible tasks running in them, rather than hosts being down. Right now the fact that the task cannot be scheduled is important because it triggers preemption and will make room. An alternative algorithm that tries at all costs to schedule the task in the TaskAssigner could decide to place the task in a non-ideal slot and leave a preemptible task running instead. It's also important to think of the knock-on effects here when we move to offer affinity (i.e. the current Dynamic Reservation proposal). If you've made this non-ideal compromise to get things scheduled - that decision will basically be permanent until the host you're on goes down. At least with how things work now, with each scheduling attempt the job has a fresh chance of being put in an ideal slot. On Thu, Mar 30, 2017 at 8:12 AM, Rick Mangi wrote: > Sorry for the late reply, but I wanted to chime in here as wanting to see > this feature. We run a medium size cluster (around 1000 cores) in EC2 and I > think we could get better usage of the cluster with more control over the > distribution of job instances. For example it would be nice to limit the > number of kafka consumers running on the same physical box. > > Best, > > Rick > > > On 2017-03-06 14:44 (-0400), Mauricio Garavaglia wrote: > > Hello!> > > > > I have a job that have multiple instances (>100) that'd I like to spread> > > across the hosts in a cluster. Using a constraint such as "limit=host:1"> > > doesn't work quite well, as I have more instances than nodes.> > > > > As a workaround I increased the limit value to something like> > > ceil(instances/nodes). But now the problem happens if a bunch of nodes > go> > > down (think a whole rack dies) because the instances will not run until> > > them are back, even though we may have spare capacity on the rest of the> > > hosts that we'd like to use. In that scenario, the job availability may > be> > > affected because it's running with fewer instances than expected. On a> > > smaller scale, the former approach would also apply if you want to > spread> > > tasks in racks or availability zones. I'd like to have one instance of a> > > job per rack (failure domain) but in the case of it going down, the> > > instance can be spawn on a different rack.> > > > > I thought we could have a scheduling constraint to "spread" instances> > > across a particular host attribute; instead of vetoing an offer right > away> > > we check where the other instances of a task are running, looking for a> > > particular attribute of the host. We try to maximize the different > values> > > of a particular attribute (rack, hostname, etc) on the task instances> > > assignment.> > > > > what do you think? did something like this came up in the past? is it> > > feasible?> > > > > > > Mauricio> > > > --001a11405cdc014d8b054bf5048c--