aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santhosh Kumar Shanmugham <santhoshkuma...@gmail.com>
Subject Re: Review Request 54439: Add support for an mttu metric (median time to updated)
Date Fri, 09 Dec 2016 08:45:54 GMT


> On Dec. 7, 2016, 10:28 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 200-202
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200>
> >
> >     Why do we only sample active updates, seems like we could miss data points?
Especially for small updates.
> 
> Joshua Cohen wrote:
>     My thinking was that the vast majority of updates in the store will be completed
hours or days ago, so there's no need to consider them when calculating the mttu. You're right,
this does mean that we might lose some data points for tasks that moved to `ASSIGNED` in the
same `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update completed.
>     
>     For reference, some general stats from one of our clusters: currently at off-peak
hours, .02% of all updates in the update store are active. It's hard to say with certainty,
historically how many updates were active at any given time. But anecdotatlly it's a small
fraction of the total number of updates in the store, generously speaking I'd say 1-2%. That
being the case, by including only active updates in the calculation, we reduce the work to
be done by anywhere from 98 to 99.98 percent.
>     
>     I feel like this is a fair trade off to make, but I'm not steadfast in that opinion.

We can add a storage method that will give all the `InstanceUpdateEvent`s during the last
`SLA_REFRESH_INTERVAL` and use that to determine the `activeUpdates` that will be looked into,
this can give a much more accurate value.


- Santhosh Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
-----------------------------------------------------------


On Dec. 8, 2016, 1:40 p.m., Joshua Cohen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 1:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and Santhosh Kumar
Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to the subsequent
`ASSIGNED` event for the task with the same instance id that matches the desired task config
from the update details.
> 
> My original approach to this involved converting `GroupType` and `AlgorithmType` from
enums (which cannot be generic) to static classes (which, of course, can). This allowed me
to avoid unnecessarily passing update details to the `calculate` method of `SlaAlgorithm`
since it's ignored in all but the one, new case. However, that ended up being a lot of churn,
and since it turns out we need both the task details and the update details to calculate this
metric, I went with the below approach. If anyone feels strongly, I could go back to generics
and create an container class that's gives access to both the tasks and update details.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 9a56cda809fbbcb07e6dd12c7a0feb272542491d

>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 5d8d5bd8f705770979f284d26d2e932aabe707e5

>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 6fbd4e962b3bb6eeb0831c810a321478fd52172c

>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 953b65f28a585375e36e305dea6f9f94f99abc93

>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 2e719ac6b7aea86faa22deff2cc6b5f73135761c

>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 341e346e794c9cf9a2789b8799f38fff900ec9b3

>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 78f440f7546de9ed6842cb51db02b3bddc9a74ff

>   src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 21d26b3930ea965487b2dec48a48a98677ba022b

>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73

>   src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 6d0e9bc6a8040393875d4f0a88e8db9d6926a88b

> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message