aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko" <ma...@apache.org>
Subject Re: Review Request 27705: Adding instrumentation into the scheduling pipeline.
Date Fri, 14 Nov 2014 16:59:20 GMT


> On Nov. 14, 2014, 2:24 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java, line 226
> > <https://reviews.apache.org/r/27705/diff/2/?file=763034#file763034line226>
> >
> >     To get the data we want, some extra analysis is needed.  Specifically - if we
want to figure out how often a scheduling attempt is vetoed _only_ for static reasons (e.g.
insufficient resources), these stats will lack signal.
> >     
> >     Instead, we probably want two counters:
> >     - scheduling_veto_static
> >     - scheduling_veto_dynamic
> >     
> >     Does that make sense?
> 
> Maxim Khutornenko wrote:
>     I don't see how more granular data would prevent us from aggregating into static/dynamic
groups. However, having aggregate metrics instead will make it impossible to do any further
analysis when needed. Why not going the more specific route instead? I would have hard time
figuring out what "scheduling_veto_static" means without digging through the sources, whereas
something like "scheduling_veto_INSUFFICIENT_RESOURCES" would immediately make sense by itself.
> 
> Bill Farner wrote:
>     The problem is that you can't discern when a task didn't match due to _only_ static
reasons.  Relevant code in `SchedulingFilterImpl`:
>     
>         return ImmutableSet.<Veto>builder()
>             .addAll(getConstraintFilter(attributeAggregate, attributes).apply(task))
>             .addAll(getResourceVetoes(offer, task))
>             .build();
>             
>     On the other end when you incrmeent counters:
>     
>         for (Veto veto : event.getVetoes()) {
>           counters.getUnchecked(vetoStatName(veto)).increment();
>         }
>     
>     At this point, you might get vetoes like: `insufficient CPU`, `insufficient RAM`,
`insufficient ports`, `limit not satisfied: host`.
>     You'll end up with these counter deltas:
>     
>     `INSUFFICIENT_RESOURCES 3`
>     `LIMIT_NOT_SATISFIED 1`
>     
>     As a result, i don't see how we could look at the stats and convince ourselves which
optimization has the greatest payoff, since a single scheduling round affects multiple counters
disproportionately.

Isn't it the same problem with the aggregate counters? I.e. in the above example we would
still see static=1 (or 3?) and dynamic=1.

To address your concern of excessive counting, how about maintaining unique veto type counters
instead? Something like this:
```java
    ListMultimap<VetoType, Veto> index = Multimaps.index(event.getVetoes(), VETO_TO_VETO_TYPE);
    for (VetoType vetoType : index.keys()) {
      counters.getUnchecked(vetoStatName(vetoType)).increment();
    }
```
For the above example, it would produce:

scheduling_veto_INSUFFICIENT_RESOURCES 1
scheduling_veto_LIMIT_NOT_SATISFIED  1


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27705/#review61385
-----------------------------------------------------------


On Nov. 14, 2014, 12:30 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27705/
> -----------------------------------------------------------
> 
> (Updated Nov. 14, 2014, 12:30 a.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-914
>     https://issues.apache.org/jira/browse/AURORA-914
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Adding @Timed to trace scheduling latencies and Veto counters per type.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/TaskVars.java cf8f7584afee758c527798914181049051aef0d8

>   src/main/java/org/apache/aurora/scheduler/async/OfferQueue.java d2682cd910d248c897e691bcb4c8a3a6f1aec2d2

>   src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java e2ba8b8fe978a58d1edcd01963ea020e54529353

>   src/main/java/org/apache/aurora/scheduler/filter/ConstraintFilter.java 3839083f27ca5d4b93406152559b58b04e912a10

>   src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilter.java c1c5f26723f1eac3000e09e061b4582f922fded6

>   src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java cc6b53b3265253f76c1e954c0108aa5936f5cc36

>   src/main/java/org/apache/aurora/scheduler/metadata/NearestFit.java 87203690f09456ac1ca5e9da2b82826d60cbd723

>   src/main/java/org/apache/aurora/scheduler/stats/CachedCounters.java aaedb3b5ec2cb27550449435efa8f335c6a9baad

>   src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 12ea4c67350c2992f59bacd21a99d1413b60b757

>   src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
94f0a179b786649775899f855f7c1a0caab7290f 
>   src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java e113eba1f304279b5ee3d70db1d1ea558efd63ac

>   src/test/java/org/apache/aurora/scheduler/metadata/NearestFitTest.java b60b004adbd6753ec6fef125fd70286be5071c56

>   src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
5c9ea6cf4eb4d99d94f5d61e784dd7c9c480798c 
> 
> Diff: https://reviews.apache.org/r/27705/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> Verified new stats in vagrant.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message