aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko" <ma...@apache.org>
Subject Re: Review Request 20398: Scheduler SLA metrics.
Date Wed, 23 Apr 2014 17:29:31 GMT


> On April 22, 2014, 7:57 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 65
> > <https://reviews.apache.org/r/20398/diff/7/?file=562468#file562468line65>
> >
> >     I'm still failing to see the upside of this enum.  An obvious downside is that
MetricCalculator.METRICS must be manually kept in sync with this, but a 1:1 mapping is expected.
 Why not push the whole thing up there?

This enum keeps all algorithm details private to the interface exposing only the configurations
that intended to be consumed. It better aligns test with prod as tests can't create algorithms
with parameters other than what's visible to MetricCalculator. The approach is consistent
with the SlaGroup.GroupType and will help isolating future algorithm development/reuse from
the consumption. It also helps readability on the declaration side (METRICS). 


> On April 22, 2014, 7:57 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, line 179
> > <https://reviews.apache.org/r/20398/diff/7/?file=562467#file562467line179>
> >
> >     Can you avoid the extra flag and loop if you just let Counter keep track of
it?
> >     
> >       class Counter {
> >         Counter(..) {
> >           // Object is initialized, but value not yet valid
> >         }
> >     
> >         private void set(..) {
> >           // Value provided is guaranteed to be up to date
> >           if (!exported) {
> >             statsProvider.makeGauge(name, this);
> >             exported = true;
> >           }
> >           ...
> >         }
> >       }
> >     
> >     I would find this easier to follow.

Good idea. Done.


> On April 22, 2014, 7:57 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java, line 40
> > <https://reviews.apache.org/r/20398/diff/7/?file=562471#file562471line40>
> >
> >     list.isEmpty()

Done.


> On April 22, 2014, 7:57 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java, line 55
> > <https://reviews.apache.org/r/20398/diff/7/?file=562475#file562475line55>
> >
> >     funky indent

Fixed.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20398/#review41053
-----------------------------------------------------------


On April 18, 2014, 8:43 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20398/
> -----------------------------------------------------------
> 
> (Updated April 18, 2014, 8:43 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-293
>     https://issues.apache.org/jira/browse/AURORA-293
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> High level overview:
> - MetricCalculator runs periodically (every minute), pulls all task history, packages
it into SlaInstance list and updates stats;
> - Stat calculation is handled by a pair of: SlaAlgorithm and a set of applicable SlaGroups
(logical groupings by job, cluster, resource and etc.);
> - Stat name is generated by combining group and algorithm name parts.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java eeafc784e915137cacd5f64df1252ccbaf6c0f6c

>   src/main/java/org/apache/aurora/scheduler/async/RescheduleCalculator.java bc77bf2e6fbc1ff4159d049b0c28afd6832499ef

>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java fae2d237d18f945d4dd73ec56cd42981359bea46

>   src/main/java/org/apache/aurora/scheduler/quota/QuotaManager.java 56f47b6545ed618b5ed754badfdbcaebdf6fc222

>   src/main/java/org/apache/aurora/scheduler/quota/ResourceAggregates.java 444c2872ae8fcf1683e9eea6f38cd42877641366

>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/stats/SlotSizeCounter.java b6610f76f2ac665f258ecd05df367cfd54b97d58

>   src/test/java/org/apache/aurora/scheduler/quota/QuotaManagerImplTest.java 14af68f4c53019f0d5d9947e0eece15e4375176a

>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java PRE-CREATION

>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/stats/SlotSizeCounterTest.java 415b48cee907f9d325fb25e03a290eda1b8cc09b

>   src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplTest.java 4a13dbc5485b4563af93548ebc50b01de2303968

> 
> Diff: https://reviews.apache.org/r/20398/diff/
> 
> 
> Testing
> -------
> 
> gradle build
> gradle run
> 
> Sample from local scheduler:
> 
> sla_cluster_mtta 3271.493
> sla_cluster_mttr 3273.497
> sla_cluster_platform_uptime_percent 100.0
> sla_cpu_small_mtta 3271.493
> sla_cpu_small_mttr 3273.497
> sla_disk_small_mtta 3271.493
> sla_disk_small_mttr 3273.497
> sla_mesos_test_serviceJob0_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob0_mtta 0.0
> sla_mesos_test_serviceJob0_mttr 0.0
> sla_mesos_test_serviceJob0_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob10_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob10_mtta 0.0
> sla_mesos_test_serviceJob10_mttr 0.0
> sla_mesos_test_serviceJob10_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob12_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob12_mtta 0.0
> sla_mesos_test_serviceJob12_mttr 0.0
> sla_mesos_test_serviceJob12_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob14_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob14_mtta 0.0
> sla_mesos_test_serviceJob14_mttr 0.0
> sla_mesos_test_serviceJob14_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob16_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob16_mtta 0.0
> sla_mesos_test_serviceJob16_mttr 0.0
> sla_mesos_test_serviceJob16_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob18_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob18_mtta 0.0
> sla_mesos_test_serviceJob18_mttr 0.0
> sla_mesos_test_serviceJob18_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob2_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob2_mtta 0.0
> sla_mesos_test_serviceJob2_mttr 0.0
> sla_mesos_test_serviceJob2_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob4_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob4_mtta 0.0
> sla_mesos_test_serviceJob4_mttr 0.0
> sla_mesos_test_serviceJob4_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob6_job_uptime_75_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_90_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_95_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_99_00_sec 7125
> sla_mesos_test_serviceJob6_mtta 3271.493
> sla_mesos_test_serviceJob6_mttr 3273.497
> sla_mesos_test_serviceJob6_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob8_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob8_mtta 0.0
> sla_mesos_test_serviceJob8_mttr 0.0
> sla_mesos_test_serviceJob8_platform_uptime_percent 100.0
> sla_ram_small_mtta 3271.493
> sla_ram_small_mttr 3273.497
> 
> 
> File Attachments
> ----------------
> 
> Coverage report
>   https://reviews.apache.org/media/uploaded/files/2014/04/16/ffe00c63-bb3a-4b90-95f8-f23878b0fdab__SLA_coverage_report.png
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message