aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zameer Manji" <zma...@apache.org>
Subject Re: Review Request 42332: Trigger shutdown on task pruning failure.
Date Fri, 15 Jan 2016 23:41:00 GMT


> On Jan. 15, 2016, 3:33 p.m., John Sirois wrote:
> > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java, line 152
> > <https://reviews.apache.org/r/42332/diff/1/?file=1197802#file1197802line152>
> >
> >     You could centralize this handling in deleteTasks - it has all the ids to give
as useful a message as you have here, and that localizes the otherwise overly broad looking
RTE catch.  It'll clean up the temp you had to introduce below as well.  Further, it'll give
you 1 nice spot to add a comment explaining why termination on 1 failed prune is sane; ie:
you can point out the inevitable history growth issue and that rolling back is better than
waiting for the failure and then rolling back.

The objective is to catch any unexpected exceptions in the Runnable given to the executor.
For example the exception that triggered this investigation did not come from delete tasks
but from sorting the set of tasks to delete:
````
FluentIterable
        .from(Tasks.LATEST_ACTIVITY.sortedCopy(inactiveTasks))
        .filter(safeToDelete)
        .limit(tasksToPrune)
        .transform(Tasks::id)
        .toSet();
````

I agree that this approach is not ideal. I'm not sure on how to change this without changing
the executor to have some sort of ListenableFuture that does something `onFailure`.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114795
-----------------------------------------------------------


On Jan. 14, 2016, 4:53 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> -----------------------------------------------------------
> 
> (Updated Jan. 14, 2016, 4:53 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
>     https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Task pruning is critical to the operation of a large cluster. Failure to prune tasks
can lead to storage growing in unexpected ways leading to scheduling slowdown. This patch
adds shutdown on task pruning failure to prevent the scheduler from entering an undefined
state.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 2064089937f5178b1413d386a312f4173a0e35fb

>   src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 295960f13693c6ba0d7075a8ef7f9680a91ae69d

> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message