aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehrdad Nurolahzade <mehr...@apache.org>
Subject Re: Review Request 59030: AURORA-1869 Reducing storage write lock contention in TaskStatusHandlerImpl
Date Fri, 05 May 2017 21:54:52 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59030/#review174086
-----------------------------------------------------------



@ReviewBot retry

- Mehrdad Nurolahzade


On May 5, 2017, 2:36 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59030/
> -----------------------------------------------------------
> 
> (Updated May 5, 2017, 2:36 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1869
>     https://issues.apache.org/jira/browse/AURORA-1869
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> `TaskStatusHandlerImpl` acquires `LogStorage` write lock for processing every status
update received from Mesos master. During implicit and explicit reconciliations, this amounts
to the number of tasks in the cluster (tens of thousands of times in our cluster).
> 
> According to data extracted from one of our production clusters, over 99.9% of reconciliation
status update events are in fact `NOOP` status updates. The storage write lock contention
induced by these status updates can simply be eliminated by adopting double-checked locking
pattern (as was done in [AURORA-1820](https://issues.apache.org/jira/browse/AURORA-1820)).
> 
> This explains why the combination of reconciliation status update processing and other
expensive processes like snapshot can be fatal for scheduler. As the lock is not fair, it
does not guarantee any particular access order. Therefore, snapshot structures might need
to sit on the heap for a few seconds before they can be written to `LogStorage` and garbage
collected.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/TaskStatusHandlerImpl.java 1aacecf3c2597a3f91dbc7da4c99fd1e80970f04

>   src/test/java/org/apache/aurora/scheduler/TaskStatusHandlerImplTest.java 56a6b0c9ae8da18e9a47428b8ed37a559cfd04e7

>   src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 21d26b3930ea965487b2dec48a48a98677ba022b

> 
> 
> Diff: https://reviews.apache.org/r/59030/diff/1/
> 
> 
> Testing
> -------
> 
> TBD under a test cluster
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message