aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
Date Wed, 14 Sep 2016 23:18:13 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51765/
-----------------------------------------------------------

(Updated Sept. 14, 2016, 11:18 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
-------

Really rebasing.


Repository: aurora


Description
-------

This is the final part of the `BatchWorker` conversion work that converts `TaskScheduler`.
See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`.

#####Problem
See https://reviews.apache.org/r/51759

#####Remediation
Task scheduling is one of the most dominant users of the write lock. It's also one of the
heaviest and the most latency-sensitive. As such, the default max batch size is chosen conservatively
low (3) and batch items are executed in a blocking way. 

BTW, attempting to make task scheduling non-blocking resulted in a much worse scheduling performance.
The way our `DBTaskStore` is wired, all async activities, including `EventBus` are bound to
use a single async `Executor`, which is currently limited at 8 threads [1]. Relying on the
same `EventBus` to deliver scheduling completion events resulted in slower scheduling perf
as those events were backed up behind all other activities, including tasks status events,
reconciliation and etc. Increasing the executor thread pool size to a larger number on the
other side, also increased the lock contention defeating the whole purpose of this work.

#####Results
See https://reviews.apache.org/r/51759 for the lock contention results.

https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54


Diffs (updated)
-----

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 9d0d40b82653fb923bed16d06546288a1576c21d

  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 11e8033438ad0808e446e41bb26b3fa4c04136c7

  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java c044ebe6f72183a67462bbd8e5be983eb592c3e9

  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java d266f6a25ae2360db2977c43768a19b1f1efe8ff

  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c2ceb4e7685a9301f8014a9183e02fbad65bca26

  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79

  src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 72562e6bd9a9860c834e6a9faa094c28600a8fed


Diff: https://reviews.apache.org/r/51765/diff/


Testing
-------

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message