aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <>
Subject Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
Date Fri, 16 Sep 2016 21:04:48 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Sept. 16, 2016, 9:04 p.m.)

Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.



Repository: aurora


This is the second part of the `BatchWorker` conversion work that moves cron jobs to use non-blocking
kill followups and reduces the number of trigger threads. See
for more background on the `BatchWorker`.

The current implementation of the cron scheduling relies on a large number of threads (`cron_scheduler_num_threads`=100)
to support cron triggering and killing existing tasks according to `KILL_EXISTING` collision
policy. This creates large spikes of activities at synchronized intervals as users tend to
schedule their cron runs around similar schedules. Moreover, the current implementation re-acquires
write locks multiple times to deliver on `KILL_EXISTING` policy. 

Trigger level batching is still done in a blocking way but multiple cron triggers may be bundled
together to share the same write transaction. Any followups, however, are performed in a non-blocking
way by relying on a `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification.
In order to still ensure non-concurrent execution of a given job key trigger, a token (job
key) is saved within the trigger itself. A concurrent trigger will bail if a kill followup
is still in progress (token is set AND no entry in `killFollowups` set exists yet).

The above approach allowed reducing the number of cron threads to 10 and likely can be reduced
even further. See for the lock contention results.

Diffs (updated)

  commons/src/main/java/org/apache/aurora/common/util/ 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7

  commons/src/test/java/org/apache/aurora/common/util/ bc30990d57f444f7d64805ed85c363f1302736d0

  src/main/java/org/apache/aurora/scheduler/cron/quartz/ c07551e94f9221b5b21c5dc9715e82caa290c2e8

  src/main/java/org/apache/aurora/scheduler/cron/quartz/ 155d702d68367b247dd066f773c662407f0e3b5b

  src/test/java/org/apache/aurora/scheduler/cron/quartz/ 5c64ff2994e200b3453603ac5470e8e152cebc55

  src/test/java/org/apache/aurora/scheduler/cron/quartz/ 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f



All types of testing including deploying to test and production clusters.


Maxim Khutornenko

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message