aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <da...@dmclaughlin.com>
Subject Re: Review Request 59699: Improve task history pruning by batch deleting tasks
Date Thu, 01 Jun 2017 02:02:59 GMT


> On June 1, 2017, 12:53 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Line 388 (original), 411 (patched)
> > <https://reviews.apache.org/r/59699/diff/1/?file=1736080#file1736080line411>
> >
> >     I'm not sure this is the right approach. I don't think we want to skip writing
transactions to the replicated log. 
> >     
> >     All of the APIs down to the storage layer already support batching so seems
like we just want to move the delete command outside of the code that generates pubsub events.
E.g. call taskStore.deleteTasks(taskIds) here when we receive the batch from the prune tasks
endpoint:
> >     
> >     https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L371
> >     
> >     And then remove the taskStore.deleteTasks from the PubsubEvent factory method
(have no idea why it's in there).
> 
> Kai Huang wrote:
>     Moving the taskStore.deleteTasks out will break some other operations:
>     e.g.
>     When we kill a pending task, we will also delete the task (generates a PubsubEvent
and calls taskStore.deleteTasks). https://github.com/apache/aurora/blob/master/src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java#L215
>     
>     
>     We need to add taskStore.deleteTasks for all other use cases that indirectly calls
taskStore.deleteTasks.
> 
> Kai Huang wrote:
>     Instead of rewriting the code, I think I just need to refactor those tests mentioned
above. Currently they were all written in a way that assumes the taskStore.deleteTasks was
included in the PubSubEvent factory.

I see. Also looking at the code it seems here we do actually still write a transaction, so
the boolean flag you've introduced is more like "skipDelete" rather than "skipTransaction".
Having a skipDelete flag in a method called deleteTasks would be a pretty big code smell :)

deleteTasks is only called by the TaskHistoryPruner and the admin endpoint, and is always
a batch endpoint - so I think the flaw in this code was pushing it into the per-task state
machine code. All we want in our deleteTasks code is:

    @Override
    public void deleteTasks(MutableStoreProvider storeProvider, final Set<String> taskIds)
{
      TaskStore.Mutable taskStore = storeProvider.getUnsafeTaskStore();
      taskStore.deleteTasks(taskIds);
      Set<PubsubEvent> events = taskIds.stream().map(id -> deleteTasks(taskStore,
id)).collect(...);
      eventSink.post(events);
    }
    
This would require moving the taskStore.deletedTasks out of the event creation code (which
I would consider renaming from deleteTasks to taskDeletedEvent or something) and into the
side-effect handler like so:


    case DELETED:
      taskStore.deleteTasks(...);
      events.add(deleteTasks(taskStore, ...));
  
  
Does that make sense?


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176567
-----------------------------------------------------------


On June 1, 2017, 12:18 a.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 12:18 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
>     https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the cluster has
a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to avoid additional
cost of coalescing. The fix will also benefit implicit task history pruning since it has similar
underlying implementation. See https://issues.apache.org/jira/browse/AURORA-1929 for more
information and details.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 73878210f9028901fda3b08e66c6a63c24260d35

> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/1/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message