aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko" <ma...@apache.org>
Subject Re: Review Request 29943: Uptime-driven scheduler job updates
Date Tue, 24 Feb 2015 19:39:59 GMT


> On Feb. 24, 2015, 7:30 p.m., Kevin Sweeney wrote:
> > Is this ready for review now?

It is. However, since AURORA-1041 is still in Open I am going to discard it and repost when
the ticket moves into Accepted.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/#review73877
-----------------------------------------------------------


On Jan. 20, 2015, 9:12 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29943/
> -----------------------------------------------------------
> 
> (Updated Jan. 20, 2015, 9:12 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-1041
>     https://issues.apache.org/jira/browse/AURORA-1041
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is the first take on implementing job uptime driven updates. In addition to the
olde good "batch_size", instances can now be dispatched in arbitrary sequence depending on
the overall uptime (health) of the job. 
> 
> The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances**
values. An excerpt from api.thrift explaining the feature:
> ```
> /**
>    * The uptime-driven update throttles the number of instances being updated at any
given moment
>    * according to the job uptime calculations. The "X% of instances up over Y interval"
invariant
>    * is preserved over the entire job update lifetime. No new instances are dispatched
for update
>    * unless that invariant is satisfied. Instances are dispatched in their natural uptime
order,
>    * shortest uptime first.
>    *
>    * For example, when set as below the update will block until at least 90% of job instances
are in
>    * RUNNING state for at least 1 minute:
>    *    waitForUptimeMs = 60000
>    *    waitForUptimePercentInstances = 90
>    *
>    * When using uptime-driven update, it's expected that updateGroupSize is left unset
to allow job
>    * uptime settings drive the update progress. However, if updateGroupSize is set it
will be
>    * pre-applied before SLA uptime calculations to determine the update working set.
As a side
>    * effect, the updateGroupSize results in a natural ordering of instances taken for
each group
>    * (instances within a group are still updated in a "shortest uptime first" order).
>    *
>    * For example, if set as below the number of instances being updated at any given
moment will
>    * never exceed 5 even though the uptime calculations may allow more than 5:
>    *    updateGroupSize = 5
>    *    waitForUptimeMs = 60000
>    *    waitForUptimePercentInstances = 90
>    *
>    * NOTE on update rollback: with the uptime-driven update, there is no reliable way
to ensure a
>    * graceful throttled rollback as unstable/flapping instances may never yield an acceptable
uptime
>    * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True
AND the
>    * updateGroupSize=0 the updater will dispatch all affected instances at once.
>    * Use rollbackOnFailure=True with caution for uptime-driven updates.
>    */
> ```
> 
> For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java
that implements the core algo.
> 
> TODO: 
> - vagrant e2e test
> - more corner case unit test coverage in JobUpdaterIT
> - client warning message in case uptime specs are used with client updater
> - docs
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9

>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95

>   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6

>   src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34

>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION

>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd

>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173

>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de

>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687

>   src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION

>   src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160

>   src/main/python/apache/aurora/client/api/updater_util.py 9d2e893a6ecff0fc48c7944575578443d41ced78

>   src/main/python/apache/aurora/config/schema/base.py d7897794c736778983d506c337a1392f3cc0cc20

>   src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 987596f733b7155fbce772e6c74a8095d5da1827

>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java d36f5652357e06d6c8944d907ee011b91e84e9c6

>   src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde

>   src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
ad9126c32893080e128d086ea3bfd7ad23d27b89 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderTest.java PRE-CREATION

>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 4c827b183a87b4d97774edbfaa960bd1c3de72a5

>   src/test/java/org/apache/aurora/scheduler/updater/TaskUtil.java 0e67f91536ff89c07da9be82049719c854aa3d62

>   src/test/java/org/apache/aurora/scheduler/updater/UpdateFactoryImplTest.java d6e855b879e7909e8ba66c03ed34c845bf978a8f

>   src/test/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategyTest.java
PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategyTest.java
PRE-CREATION 
>   src/test/python/apache/aurora/client/api/test_api.py ff1aff2eac391f219bc7c2483a16e35f916a224c

>   src/test/python/apache/aurora/client/api/test_updater.py dd3f228c5062d388b4393aa4fd5b60a685bdb3a6

>   src/test/python/apache/aurora/client/api/test_updater_util.py fe3ac49491ca710761632405ac09de0cc0d038a5

> 
> Diff: https://reviews.apache.org/r/29943/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> ./pants src/test/python:all
> manual testing in vagrant
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message