aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko" <ma...@apache.org>
Subject Re: Review Request 42332: Turn TaskHistoryPruner into a service and trigger shutdown on pruning failure.
Date Thu, 21 Jan 2016 22:19:26 GMT


> On Jan. 21, 2016, 6:51 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java, lines
152-161
> > <https://reviews.apache.org/r/42332/diff/3/?file=1203998#file1203998line152>
> >
> >     Am I reading this as a busy loop consuming 100% thread CPU waiting for something
that may never happen? I don't think this is an acceptable solution.
> >     
> >     Perhaps it's time to refactor task prunner into an AbstractScheduledService?
I always felt task prunner approach of holding on to task IDs for 2 days just to act once
on them isn't very efficient. What if instead of acting on every particular task ID we have
a periodic (say every 30 seconds) run loop to prune job keys instead?
> >     
> >     Implementation-wise, it could be a Set of unique job keys that we populate on
every TaskStateChange event. A runOneIteration() would poll that set and apply both latency
and max_per_job thresholds for all related terminal tasks within the same iteration.
> >     
> >     The only downside for the above is a somewhat increased history count between
the cleanup runs but given that our current thresholds are chosen mostly arbitrarily I think
that should be acceptable.
> 
> John Sirois wrote:
>     I think my Future/Queue suggestion above solves the busy loop with no liveness penalty.
 That might allow your batching change suggestion to happen in a seperate follow-up RB.
> 
> Zameer Manji wrote:
>     +1 to John here. I think we are overdue for a less complex and heavy pruner but I
would prefer to keep this RB focused on failure propagation. I am open to a follow up ticket
and RB. Maxim, if you agree, I can create a ticket that tracks the work you just proposed.
>     
>     Right now, I think I will use the Future/Queue suggestion that John has to remove
the busy loop.

I am fine with the follow up ticket if you feel it's too much to lift within this RB.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review115663
-----------------------------------------------------------


On Jan. 20, 2016, 10:39 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> -----------------------------------------------------------
> 
> (Updated Jan. 20, 2016, 10:39 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
>     https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Task pruning is key to operating a large cluster and failure to prune should trigger
shutdown to prevent unbounded growth of storage. This patch turns `TaskHistoryPruner` into
a service which propagates failure from failed pruning attempts towards the `ServiceManager`.
Also completing a TODO which removes a test for behaviour that is very awkward to test for.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/PruningModule.java 735199ac1ccccab343c24471890aa330d6635c26

>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 2064089937f5178b1413d386a312f4173a0e35fb

>   src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 295960f13693c6ba0d7075a8ef7f9680a91ae69d

> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build -Pq
> e2e tests
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message