aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sirois" <jsir...@apache.org>
Subject Re: Review Request 42332: Trigger shutdown on task pruning failure.
Date Fri, 15 Jan 2016 23:47:38 GMT


> On Jan. 15, 2016, 4:33 p.m., John Sirois wrote:
> > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java, line 152
> > <https://reviews.apache.org/r/42332/diff/1/?file=1197802#file1197802line152>
> >
> >     You could centralize this handling in deleteTasks - it has all the ids to give
as useful a message as you have here, and that localizes the otherwise overly broad looking
RTE catch.  It'll clean up the temp you had to introduce below as well.  Further, it'll give
you 1 nice spot to add a comment explaining why termination on 1 failed prune is sane; ie:
you can point out the inevitable history growth issue and that rolling back is better than
waiting for the failure and then rolling back.
> 
> Zameer Manji wrote:
>     The objective is to catch any unexpected exceptions in the Runnable given to the
executor. For example the exception that triggered this investigation did not come from delete
tasks but from sorting the set of tasks to delete:
>     ````
>     FluentIterable
>             .from(Tasks.LATEST_ACTIVITY.sortedCopy(inactiveTasks))
>             .filter(safeToDelete)
>             .limit(tasksToPrune)
>             .transform(Tasks::id)
>             .toSet();
>     ````
>     
>     I agree that this approach is not ideal. I'm not sure on how to change this without
changing the executor to have some sort of ListenableFuture that does something `onFailure`.

Right, and that sounds like a great idea to me, it would be nice to have a more centralized
way to deal with this and a ListenableFutures provide it:
  http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/ListeningExecutorService.html
  http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/MoreExecutors.html#listeningDecorator(java.util.concurrent.ExecutorService)
  
Not sure if you want to pursue that idea and drop this or do this change, add the larger change
then flip over to it later.


- John


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114795
-----------------------------------------------------------


On Jan. 14, 2016, 5:53 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> -----------------------------------------------------------
> 
> (Updated Jan. 14, 2016, 5:53 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
>     https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Task pruning is critical to the operation of a large cluster. Failure to prune tasks
can lead to storage growing in unexpected ways leading to scheduling slowdown. This patch
adds shutdown on task pruning failure to prevent the scheduler from entering an undefined
state.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 2064089937f5178b1413d386a312f4173a0e35fb

>   src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 295960f13693c6ba0d7075a8ef7f9680a91ae69d

> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message