aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zameer Manji <zma...@apache.org>
Subject Re: Review Request 56575: AURORA-1837 Improve task history pruning
Date Wed, 15 Feb 2017 00:07:04 GMT


> On Feb. 14, 2017, 4:02 p.m., Mehrdad Nurolahzade wrote:
> > Looking at it as is, I'm not sure if there is much value to be gained from pushing
this down to `TaskStore`.
> > Do you see any value in pursuing this idea any further? Or shall I restore it to
previous state?

I think there is value in moving operations like this down to storage.
First, it mirrors the update store for pruning old updates.
Second, I think we can benefit from more precise storage operations. It allows us to leverage
the power of DbTaskStore (ie we don't need to hydrate a full object for deletion) and ensures
that callers don't do excessive work.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56575/#review165599
-----------------------------------------------------------


On Feb. 14, 2017, 3:38 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56575/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2017, 3:38 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Bugs: AURORA-1837
>     https://issues.apache.org/jira/browse/AURORA-1837
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This patch addressed efficiency issues in the current implementation of `TaskHistoryPruner`.
The new design is similar to that of `JobUpdateHistoryPruner`: (a) Instead of registering
a `DelayExecutor` run upon terminal task state transitions, it runs on preconfigured intervals,
finds all terminal state tasks that meet pruning criteria and deletes them. (b) Makes the
initial task history pruning delay configurable so that it does not hamper scheduler upon
start.
> 
> The new design addressed the following two efficiecy problems:
> 
> 1. Upon scheduler restart/failure, the in-memory state of task history pruning scheduled
with `DelayExecutor` is lost. `TaskHistoryPruner` learns about these dead tasks upon restart
when log is replayed. These expired tasks are picked up by the second call to `executor.execute()`
that performs job level pruning immediately (i.e., without delay). Hence, most task history
pruning happens after scheduler restarts and can severely hamper scheduler performance (or
cause consecutive fail-overs on test clusters when we put load test on scheduler).
> 
> 2. Expired tasks can be picked up for pruning multiple times. The asynchronous nature
of `BatchWorker` which used to process task deletions introduces some delay between delete
enqueue and delete execution. As a result, tasks already queued for deletion in a previous
evaluation round might get picked up, evaluated and enqueued for deletion again. This is evident
in `tasks_pruned` metric which reflects numbers much higher than the actual number of expired
tasks deleted.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/PruningModule.java 735199ac1ccccab343c24471890aa330d6635c26

>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java f77849498ff23616f1d56d133eb218f837ac3413

>   src/main/java/org/apache/aurora/scheduler/storage/TaskStore.java 1094a122fe836e53d0481ee5c097447f1e91fa0a

>   src/main/java/org/apache/aurora/scheduler/storage/db/DbTaskStore.java a649a6e3d2f2d0aeaf6d7ac704ed24911c310a1e

>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java d89e715b1b08faf95f8b5788c9c28cbbb33af093

>   src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 14e4040e0b94e96f77068b41454311fa3bf53573

>   src/test/java/org/apache/aurora/scheduler/storage/mem/InMemTaskStoreTest.java 02719c312294b58525c1fddd3ed096a9b1cef601

> 
> Diff: https://reviews.apache.org/r/56575/diff/
> 
> 
> Testing
> -------
> 
> Manual testing under Vagrant
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message