aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Hatfield (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-1579) Allow preflight-check of Job schedulability.
Date Wed, 13 Jan 2016 19:02:40 GMT

     [ https://issues.apache.org/jira/browse/AURORA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brian Hatfield updated AURORA-1579:
-----------------------------------
    Description: 
The goal of this feature is to allow users to check if their job (as configured) would likely
be schedulable given Aurora's current offers. An extended form of this feature would be able
to perform this test while assuming any current instance of the job in question would be stopped.

Here is the suggestion I sent to the mailing list describing my use-case for such a feature:
{quote}
We currently run a (relatively) small Mesos/Aurora cluster, and don't always have significant
resource overhead available.

Sometimes, we go to schedule a job and we're just short of what we estimated-by-hand we'd
need in the cluster for it. Most of the tasks schedule - but a few stay "PENDING" because
of the resource constraint. This often confuses users, or in some cases, causes the command
to block for a while until it eventually times out.

We're currently working in-house on automating somewhat-more-precise basic estimation with
information sourced from /offers to get a sense of "nope, your task won't schedule" to provide
fast feedback that doesn't manipulate the state of the cluster. 

However, our basic estimation doesn't include co-scheduling constraints, quotas, etc., which
seem like something Aurora would be able to determine.
{quote}

It is worth noting that this kind of feature is inherently subject to race conditions and
future restrictions. Somewhat paradoxically, this feature is more useful the smaller your
quota or cluster is, as many actions in a restricted environment will require adding capacity
(or quota). It is worth documenting this feature to mention that there are cases where your
tasks could still end up pending - losing a race, host failure, "oddly shaped tasks" failing
to reschedule, etc.

  was:
The goal of this feature is to allow users to check if their job (as configured) would likely
be schedulable given Aurora's current offers. An extended form of this feature would be able
to perform this test while assuming any current instance of the job in question would be stopped.

Here is the suggestion I sent to the mailing list describing my use-case for such a feature:
{quote}
We currently run a (relatively) small Mesos/Aurora cluster, and don't always have significant
resource overhead available.

Sometimes, we go to schedule a job and we're just short of what we estimated-by-hand we'd
need in the cluster for it. Most of the tasks schedule - but a few stay "PENDING" because
of the resource constraint. This often confuses users, or in some cases, causes the command
to block for a while until it eventually times out.

We're currently working in-house on automating somewhat-more-precise basic estimation with
information sourced from /offers to get a sense of "nope, your task won't schedule" to provide
fast feedback that doesn't manipulate the state of the cluster. 

However, our basic estimation doesn't include co-scheduling constraints, quotas, etc., which
seem like something Aurora would be able to determine.
{quote}


> Allow preflight-check of Job schedulability.
> --------------------------------------------
>
>                 Key: AURORA-1579
>                 URL: https://issues.apache.org/jira/browse/AURORA-1579
>             Project: Aurora
>          Issue Type: Task
>          Components: Client, Scheduler
>            Reporter: Brian Hatfield
>            Priority: Minor
>
> The goal of this feature is to allow users to check if their job (as configured) would
likely be schedulable given Aurora's current offers. An extended form of this feature would
be able to perform this test while assuming any current instance of the job in question would
be stopped.
> Here is the suggestion I sent to the mailing list describing my use-case for such a feature:
> {quote}
> We currently run a (relatively) small Mesos/Aurora cluster, and don't always have significant
resource overhead available.
> Sometimes, we go to schedule a job and we're just short of what we estimated-by-hand
we'd need in the cluster for it. Most of the tasks schedule - but a few stay "PENDING" because
of the resource constraint. This often confuses users, or in some cases, causes the command
to block for a while until it eventually times out.
> We're currently working in-house on automating somewhat-more-precise basic estimation
with information sourced from /offers to get a sense of "nope, your task won't schedule" to
provide fast feedback that doesn't manipulate the state of the cluster. 
> However, our basic estimation doesn't include co-scheduling constraints, quotas, etc.,
which seem like something Aurora would be able to determine.
> {quote}
> It is worth noting that this kind of feature is inherently subject to race conditions
and future restrictions. Somewhat paradoxically, this feature is more useful the smaller your
quota or cluster is, as many actions in a restricted environment will require adding capacity
(or quota). It is worth documenting this feature to mention that there are cases where your
tasks could still end up pending - losing a race, host failure, "oddly shaped tasks" failing
to reschedule, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message