aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko" <ma...@apache.org>
Subject Re: Review Request 32352: Making preemptor asynchronous. Part 3(final) - background service.
Date Tue, 24 Mar 2015 21:24:20 GMT


> On March 24, 2015, 6:32 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessor.java,
line 100
> > <https://reviews.apache.org/r/32352/diff/1/?file=902121#file902121line100>
> >
> >     This regresses on some important characteristics of `TaskGroups`:
> >     
> >     - starvation avoidance: a very large job (even a low priority one) can starve
a single restarting instance
> >     - backoffs: we will perform redundant and possibly hopeless work to try to schedule
the same things without reducing frequency
> >     
> >     It might be easier to retrofit `TaskGroups` to allow injection of the underlying
work than to reimplement those behaviors, but i'm interested in what you think.

>starvation avoidance: a very large job (even a low priority one) can starve a single restarting
instance

I think this can be a fair concern when the number of pending tasks exceeds the number of
theoretical slots in the cluster. I don't think the `TaskGroups` fits here given its penalty
system, rescheduling calculator and executor driven evaluation. We can go with a (hopefully)
simpler `TaskGroup` "circular buffer" interator batch algorithm instead, for example:

Create a list of `TaskGroup` instances ordered by the number of pending tasks in each group
(largest first).
Create (or maintain) a circular *buffer* from the above list.
For each slave:
  While no slot found and we have not looped:
    Evaluate *current* `TaskGroup`
    Move *current* to next
  
After all slaves are evaluated we will have a mapping of `TaskGroup`s to slots that we can
create victims from.

Consider TaskGroup to pending task mapping:
TG1 | TG2 | TG3 | TG4
----|-----|-----|----
1.1 | 2.1 | 3.1 | 4.1
----|-----|-----|----
1.2 | 2.2 | 3.2 | 
----|-----|-----|----
1.3 | 2.3 |     |
----|-----|-----|----
1.4 |     |     |
----|-----|-----|----
1.5 |

and slaves A, B, C, D, E, F, G, H

A possible slot allocation after all slaves are processed could be:

A:1.1, B:NONE, C:3.1, D:4.1, E:1.2, F:2.1, G:3.2, H:1.3

This will give a fair chance to smaller job while making sure larger jobs are not penalized
unnecessarily. 

>backoffs: we will perform redundant and possibly hopeless work to try to schedule the
same things without reducing frequency

I am not concerned about backoffs here. While we are going to run the above algorithm every
minute, we will only evaluate tasks that don't already have slots reserved for. With slot
reservation timeout currently set to 5 minutes, I don't see much redundancy that we would
have to throttle.

If you agree with the above proposal I will add it into AURORA-1219 to address shortly after.


> On March 24, 2015, 6:32 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessor.java,
line 73
> > <https://reviews.apache.org/r/32352/diff/1/?file=902121#file902121line73>
> >
> >     I'm beginning to question this pattern.  It seems reasonable to insist that
the global injector _never_ has a binding for a general type like this.  With that in mind,
you should be comfortable using a private module and an unqualified binding, saving this kind
of clutter.  What do you think?

I am not keen on this idea given that the clutter will shift from attributes to private modules
but I willing to give it a try to possibly it within the module itself.


> On March 24, 2015, 6:32 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java, lines 227-229
> > <https://reviews.apache.org/r/32352/diff/1/?file=902120#file902120line227>
> >
> >     Isn't this scenario transparent to the scheduler, though?  If i understand you
correctly, an available reservation means the slot has been freed and is now a reserved Offer,
which the assigner function handles.
> >     
> >     However, another nice TODO would be to eventually remove the fallback to preemption
here.  Seems reasonable that this body of code only speaks in Offers, and the newly-established
territory is responsible for generating offers from tasks when it sees fit.

I am not quite sure what you mean here. If by "newly-established territory" you mean preemptor
then I don't see it a good fit for acting on its own quite yet. There is always a chance a
PENDING task is killed after a slot is found and preemptor will kill tasks which resources
will never get reclaimed.

The idea I tried to capture by this TODO is elimination of Round 3 below:
- Round 1: Pending task fails to schedule, no slot found in preemptor
- Some time later a slot is found by the preemptor
- Round 2: Pending task fails to schedule, calls into preemptor and preempts tasks on a slave.
**Slave reservation is created but can't be used right away as currently coded.**
- Round 3: Pending task fails to schedule, reservation is found and task is finally scheduled.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32352/#review77609
-----------------------------------------------------------


On March 21, 2015, 2:19 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32352/
> -----------------------------------------------------------
> 
> (Updated March 21, 2015, 2:19 a.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-1158
>     https://issues.apache.org/jira/browse/AURORA-1158
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Summary of changes:
> - The PreemptorImpl now only hosts slot validation and task preemption logic and requires
a write transaction.
> - PendingTaskProcessor is called every minute to pull all available PENDING tasks and
search for preemption slots.
> - TaskScheduler has no connection to slot search anymore. It now completely relies on
periodic PreemptionService to find available slots.
> - preemption_delay is reduced from 10 to 3 minutes.
> 
> Benchmark refactoring will be addressed separately.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 5309e8140fff411da30ee87c1b3b1a55d6fdaeeb

>   src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java 1b9d741dba7b9c2663ff119cd44adc8403c0d257

>   src/main/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessor.java
PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptionSlotCache.java
4ca36e5fe2cfd326f4d4f37f70dbcd0060109e73 
>   src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptionSlotFinder.java
84bcdc57d798aca229d4f184cae065ec4dcf8fc5 
>   src/main/java/org/apache/aurora/scheduler/async/preemptor/Preemptor.java 84791a272f245c729706af95d70c7f1631bfe99c

>   src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImpl.java 18a2e6032ba86ff7efab4d42a4d83798a1d06b06

>   src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorMetrics.java 782e751f5b05391ebeee4d947570cc174dddece2

>   src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorModule.java 7034a07eae1f94d7a0bbccdf8146cf3ed0a5424e

>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java da7b662c3ca4040221805cba81e5ac7b64fb3df4

>   src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 29fe156da19f3c08af00a951bb3baccf2b97a6cb

>   src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java f5c2128e90eb5c849d68431225661d1cfa7da0cc

>   src/test/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessorTest.java
PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImplTest.java d17c4fb513afdb7d8ef6d7c2b0aef86c1f47c082

>   src/test/java/org/apache/aurora/scheduler/async/preemptor/PreemptorModuleTest.java
0e2e958a053e5cee280b947f7349c076e2addb45 
> 
> Diff: https://reviews.apache.org/r/32352/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> Manual testing in vagrant.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message