aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko" <ma...@apache.org>
Subject Re: Review Request 20398: Scheduler SLA metrics.
Date Wed, 23 Apr 2014 23:50:28 GMT


> On April 23, 2014, 10:16 p.m., Kevin Sweeney wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, line 67
> > <https://reviews.apache.org/r/20398/diff/8/?file=565882#file565882line67>
> >
> >     You can save a good amount of wiring boilerplate if you use Guava's AbstractScheduledService
[1] here instead. Perhaps drop a TODO?
> >     
> >     [1] http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/AbstractScheduledService.html

The AbstractScheduledService is good for maintaining a stateful service, which MetricCalculator
is not. I.e. there is nothing to do in start() or stop() right now. I gave it a try anyway
to see if it at least handles errors properly but no, I still had to use afterExecute() hack
to surface any unhandled exceptions. 


> On April 23, 2014, 10:16 p.m., Kevin Sweeney wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java, line 42
> > <https://reviews.apache.org/r/20398/diff/8/?file=565884#file565884line42>
> >
> >     logical grouping of what?

Clarified.


> On April 23, 2014, 10:16 p.m., Kevin Sweeney wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java, line 68
> > <https://reviews.apache.org/r/20398/diff/8/?file=565885#file565885line68>
> >
> >     again, this could be simplified with AbstractScheduledService

Unfortunately, it does not.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20398/#review41201
-----------------------------------------------------------


On April 23, 2014, 5:42 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20398/
> -----------------------------------------------------------
> 
> (Updated April 23, 2014, 5:42 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-293
>     https://issues.apache.org/jira/browse/AURORA-293
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> High level overview:
> - MetricCalculator runs periodically (every minute), pulls all task history, packages
it into SlaInstance list and updates stats;
> - Stat calculation is handled by a pair of: SlaAlgorithm and a set of applicable SlaGroups
(logical groupings by job, cluster, resource and etc.);
> - Stat name is generated by combining group and algorithm name parts.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 2bedf9bbbccf4c43760cd68b5825c44f279a63ca

>   src/main/java/org/apache/aurora/scheduler/async/RescheduleCalculator.java bc77bf2e6fbc1ff4159d049b0c28afd6832499ef

>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java fae2d237d18f945d4dd73ec56cd42981359bea46

>   src/main/java/org/apache/aurora/scheduler/quota/QuotaManager.java 56f47b6545ed618b5ed754badfdbcaebdf6fc222

>   src/main/java/org/apache/aurora/scheduler/quota/ResourceAggregates.java 444c2872ae8fcf1683e9eea6f38cd42877641366

>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/stats/SlotSizeCounter.java b6610f76f2ac665f258ecd05df367cfd54b97d58

>   src/test/java/org/apache/aurora/scheduler/quota/QuotaManagerImplTest.java 14af68f4c53019f0d5d9947e0eece15e4375176a

>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java PRE-CREATION

>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/stats/SlotSizeCounterTest.java 415b48cee907f9d325fb25e03a290eda1b8cc09b

>   src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplTest.java 4a13dbc5485b4563af93548ebc50b01de2303968

> 
> Diff: https://reviews.apache.org/r/20398/diff/
> 
> 
> Testing
> -------
> 
> gradle build
> gradle run
> 
> Sample from local scheduler:
> 
> sla_cluster_mtta 3271.493
> sla_cluster_mttr 3273.497
> sla_cluster_platform_uptime_percent 100.0
> sla_cpu_small_mtta 3271.493
> sla_cpu_small_mttr 3273.497
> sla_disk_small_mtta 3271.493
> sla_disk_small_mttr 3273.497
> sla_mesos_test_serviceJob0_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob0_mtta 0.0
> sla_mesos_test_serviceJob0_mttr 0.0
> sla_mesos_test_serviceJob0_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob10_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob10_mtta 0.0
> sla_mesos_test_serviceJob10_mttr 0.0
> sla_mesos_test_serviceJob10_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob12_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob12_mtta 0.0
> sla_mesos_test_serviceJob12_mttr 0.0
> sla_mesos_test_serviceJob12_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob14_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob14_mtta 0.0
> sla_mesos_test_serviceJob14_mttr 0.0
> sla_mesos_test_serviceJob14_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob16_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob16_mtta 0.0
> sla_mesos_test_serviceJob16_mttr 0.0
> sla_mesos_test_serviceJob16_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob18_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob18_mtta 0.0
> sla_mesos_test_serviceJob18_mttr 0.0
> sla_mesos_test_serviceJob18_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob2_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob2_mtta 0.0
> sla_mesos_test_serviceJob2_mttr 0.0
> sla_mesos_test_serviceJob2_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob4_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob4_mtta 0.0
> sla_mesos_test_serviceJob4_mttr 0.0
> sla_mesos_test_serviceJob4_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob6_job_uptime_75_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_90_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_95_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_99_00_sec 7125
> sla_mesos_test_serviceJob6_mtta 3271.493
> sla_mesos_test_serviceJob6_mttr 3273.497
> sla_mesos_test_serviceJob6_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob8_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob8_mtta 0.0
> sla_mesos_test_serviceJob8_mttr 0.0
> sla_mesos_test_serviceJob8_platform_uptime_percent 100.0
> sla_ram_small_mtta 3271.493
> sla_ram_small_mttr 3273.497
> 
> 
> File Attachments
> ----------------
> 
> Coverage report
>   https://reviews.apache.org/media/uploaded/files/2014/04/16/ffe00c63-bb3a-4b90-95f8-f23878b0fdab__SLA_coverage_report.png
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message