aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <>
Subject Re: Review Request 57487: Implementation of Dynamic Reservations Proposal
Date Tue, 04 Apr 2017 23:07:29 GMT

> On March 30, 2017, 11:56 p.m., David McLaughlin wrote:
> > The motivation for this is a performance optimization (less Scheduling loop overhead
+ cache locality on the target host). So why should that decision be encoded in the service
tier? We'd want every single task using this and wouldn't want users even knowing about it.
And we still want to have the preferred vs preemptible distinction. 
> > 
> > Currently a task restart is a powerful tool to undo a bad scheduling round or for
whatever reason to get off a host - e.g. to get away from a noisy neighbor or a machine that's
close to falling over. If I'm reading this patch correctly, they lose this ability after this
change? Or at least the change is now - kill the task, wait for some operator defined timeout
and then schedule it again with the original config. 
> > 
> > What happens when we want to extend the use of Dynamic Reservations and give users
control over when they are collected. What tier would we use then? How would reserved offers
be collected? It seems like this implementation is not future proof at all.
> Dmitriy Shirchenko wrote:
>     David, thanks for your comment. I would add the following to performance optimization
as improvements and features that this patch will offer:
>     * Consistent MTTA for any size when upgrading, irrespective of cluster capacity and
demand, assuming an upgrade does not increase the resource vector (sizing down is OK).
>     * Shorter MTTR for tasks using Docker or unified containerizer, reserved tasks will
get consistent placement of each task on the same host, resulting in less work for the Mesos
or Docker fetcher as host’s warm cache can be leveraged and previous image layer already
exist on each host.
>     * After a job is placed on each host, task failures cannot be in a PENDING state
transition as we guarantee resource availability.
>     * This implementation lays foundation for support of persistent volumes in Aurora.
>     The way tier is added, you absolutely can make a reserved job preemptible. All you
would do is specify a new tier definition in tiers.json and set both 'reserved' and 'preemptable'
to `True`. 
>     About restarts, you bring up a good point. I would like to add that if a task does
not have a `reserved` set to True inside `TierInfo` then nothing changes and restarts proceed
by rescheduling the task onto different hosts. However, if a task will want reserved resources,
it implies to us that they want "stickiness" so the task would be scheduled on the same host.
I feel like that contradicts use case of trying to get away from noisy neighbors and yes,
the story is not great for this case. We can brainstorm on possible solutions for this use
case. If this would be an immediately required feature, we can add an `unreserve` operation
to any offer that comes back from a reserved task before rescheduling. How does that sound?

>     Would you elaborate what you are referring to about control over dynamically reserved
resources? Do we currently give users a control beyond host constraints at the moment? Currently
reserved offers are not collected but with @serb's nice suggestion are simply expired if offer
is unused. To collect them, we can bring back the `OfferReconciler` if the complexity warrants

Well the main point about preemptable is that now users have to opt into the reserved tier
and now we need to set up tiers for each combination of parameters (side note: I'm not a huge
fan of this static tiers concept, seems broken to me). Currently there is no easy way to automatically
migrate running tasks from one tier to another, nor to force users to update to a certain
tier without making a custom client. In fact because of the DSL it's not even possible to
upgrade all of the job configs to this new tier either. 

Why is this a problem? Well from our use case at Twitter, we want this feature in order to
run our clusters close to full capacity. Currently Aurora is not very good when there is very
little headroom. The fact that preemptible or revocable tasks can get launched on an offer
vacated by a production task as part of an update is problematic. It can force the prod job
to go through the preemption loop to get the same slot back and adds hours to a production
job's deploy time (and the churn created severely impacts those trying to run test jobs).
Because this is a performance optimization, we do not want users to opt in to this feature,
and instead have it applied to every service without users even knowing. 

My point about control over dynamically reserved resources relates to your last point that
this lays a foundation for persistent storage. Right now we are using the reserved tier and
we automatically reclaim the resources with a timer. In the case of persistent storage the
decision to reclaim a reservation - even if the host is down for hours - should be totally
in the hands of the service owner. This is at least what we would need to meet the use cases
we have here. So what you'll end up needing is another tier "reallyReserved" or something
else to be able to disable the automatic unreserving of resources (this is true with the current
mechanism in this patch, or with the old reconciliation logic you had). Having these two reserved
tiers would be confusing for users. 

I'm really not convinced we want to use dynamic reservations for this problem. As I said on
the dev list, I think doing this with the same mechanisms as preemption with some shared state
between JobUpdateController and TaskAssigner is a cleaner solution that leaves the reserved
tier open for when we actually need it. At the absolute minimum, if we use DR we need to hide
this fact from the user.

- David

This is an automatically generated e-mail. To reply, visit:

On March 31, 2017, 8:52 p.m., Dmitriy Shirchenko wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> -----------------------------------------------------------
> (Updated March 31, 2017, 8:52 p.m.)
> Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.
> Repository: aurora
> Description
> -------
> Esteemed reviewers, here is the latest iteration on the implementation of dynamic reservations.
Some changes are merging of the patches into a single one, updated design document with a
more high level overview and user stories told from an operator’s point of view. Unit TESTS
are going to be done as soon as we agree on the approach, as I have tested this patch on local
vagrant and a multi-node dev cluster. Jenkins build is expected to fail as tested are incomplete.
> For reference, here are previous two patches which feedback I addressed in this new single
> Previous 2 patches:
> RFC document:
> Design Doc [UPDATED]:
> Diffs
> -----
>   src/jmh/java/org/apache/aurora/benchmark/ f2296a9d7a88be7e43124370edecfe64415df00f

>   src/jmh/java/org/apache/aurora/benchmark/fakes/ 6f2ca35c5d83dde29c24865b4826d4932e96da80

>   src/main/java/org/apache/aurora/scheduler/ bc40d0798f40003cab5bf6efe607217e4d5de9f1

>   src/main/java/org/apache/aurora/scheduler/ 676dfd9f9d7ee0633c05424f788fd0ab116976bb

>   src/main/java/org/apache/aurora/scheduler/ c45b949ae7946fc92d7e62f94696ddc4f0790cfa

>   src/main/java/org/apache/aurora/scheduler/ c6ad2b1c48673ca2c14ddd308684d81ce536beca

>   src/main/java/org/apache/aurora/scheduler/base/ b12ac83168401c15fb1d30179ea8e4816f09cd3d

>   src/main/java/org/apache/aurora/scheduler/base/ f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49

>   src/main/java/org/apache/aurora/scheduler/configuration/ ad6b3efb69d71e8915044abafacec85f8c9efc59

>   src/main/java/org/apache/aurora/scheduler/events/ f6c759f03c4152ae93317692fc9db202fe251122

>   src/main/java/org/apache/aurora/scheduler/filter/ 36608a9f027c95723c31f9915852112beb367223

>   src/main/java/org/apache/aurora/scheduler/filter/ df51d4cf4893899613683603ab4aa9aefa88faa6

>   src/main/java/org/apache/aurora/scheduler/mesos/ 0d639f66db456858278b0485c91c40975c3b45ac

>   src/main/java/org/apache/aurora/scheduler/offers/ 78255e6dfa31c4920afc0221ee60ec4f8c2a12c4

>   src/main/java/org/apache/aurora/scheduler/offers/ adf7f33e4a72d87c3624f84dfe4998e20dc75fdc

>   src/main/java/org/apache/aurora/scheduler/offers/ 317a2d26d8bfa27988c60a7706b9fb3aa9b4e2a2

>   src/main/java/org/apache/aurora/scheduler/preemptor/ 5ed578cc4c11b49f607db5f7e516d9e6022a926c

>   src/main/java/org/apache/aurora/scheduler/resources/ 291d5c95916915afc48a7143759e523fccd52feb

>   src/main/java/org/apache/aurora/scheduler/resources/ 7040004ae48d3a9d0985cb9b231f914ebf6ff5a4

>   src/main/java/org/apache/aurora/scheduler/resources/ 9aa263a9cfae03a9a0c5bc7fe3a1405397d3009c

>   src/main/java/org/apache/aurora/scheduler/scheduling/
>   src/main/java/org/apache/aurora/scheduler/scheduling/ 03a0e8485d1a392f107fda5b4af05b7f8f6067c6

>   src/main/java/org/apache/aurora/scheduler/scheduling/ 203f62bacc47470545d095e4d25f7e0f25990ed9

>   src/main/java/org/apache/aurora/scheduler/state/ a177b301203143539b052524d14043ec8a85a46d

>   src/main/java/org/apache/aurora/scheduler/stats/ 40451e91aed45866c2030d901160cc4e084834df

>   src/main/resources/org/apache/aurora/scheduler/tiers.json 34ddb1dc769a73115c209c9b2ee158cd364392d8

>   src/test/java/org/apache/aurora/scheduler/ 82e40d509d84c37a19b6a9ef942283d908833840

>   src/test/java/org/apache/aurora/scheduler/configuration/
>   src/test/java/org/apache/aurora/scheduler/http/ 30699596a1c95199df7504f62c5c18cab1be1c6c

>   src/test/java/org/apache/aurora/scheduler/mesos/ 93cc34cf8393f969087cd0fd6f577228c00170e9

>   src/test/java/org/apache/aurora/scheduler/offers/ PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/offers/ d7addc0effb60c196cf339081ad81de541d05385

>   src/test/java/org/apache/aurora/scheduler/resources/ dded9c34749cf599d197ed312ffb6bf63b6033f1

>   src/test/java/org/apache/aurora/scheduler/resources/ b8b8edb1a21ba89b8b60f8f8451c8c776fc23ae8

>   src/test/java/org/apache/aurora/scheduler/resources/ e04f6113c43eca4555ee0719f8208d7c4ebb8d61

>   src/test/java/org/apache/aurora/scheduler/scheduling/
>   src/test/java/org/apache/aurora/scheduler/scheduling/ fa1a81785802b82542030e1aae786fe9570d9827

>   src/test/java/org/apache/aurora/scheduler/sla/ 78f440f7546de9ed6842cb51db02b3bddc9a74ff

>   src/test/java/org/apache/aurora/scheduler/state/ cf2d25ec2e407df7159e0021ddb44adf937e1777

> Diff:
> Testing
> -------
> Tested on local vagrant for following scenarios:
> Reserving a task
> Making sure returned offer comes back
> Making sure offer is unreserved
> Thanks,
> Dmitriy Shirchenko

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message