sling-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothee Maret (Jira)" <>
Subject [jira] [Closed] (SLING-8531) Support JournalAvailabilityChecker exponential backoff
Date Thu, 14 Nov 2019 16:04:00 GMT


Timothee Maret closed SLING-8531.

> Support JournalAvailabilityChecker exponential backoff 
> -------------------------------------------------------
>                 Key: SLING-8531
>                 URL:
>             Project: Sling
>          Issue Type: Improvement
>          Components: Content Distribution
>    Affects Versions: Content Distribution Journal Core 0.1.2
>            Reporter: Timothee Maret
>            Assignee: Christian Schneider
>            Priority: Major
>             Fix For: Content Distribution Journal Core 0.1.4, Content Distribution Journal
Kafka 0.1.4, Content Distribution Journal Messages 0.1.2
>          Time Spent: 20m
>  Remaining Estimate: 0h
> The average load generated by JournalAvailabilityChecker multiplies quickly for multi
tenant deployments. The checker can be configured (via Sling Scheduler {{scheduler.period}})
to reduce the polling frequency but doing so also reduces the sensibility to detect availability
> To improve the sensibility we should support an exponential backoff algorithm. The algorithm
would divide the rate by two (up to a limit) every time the availability status does not
change and reset the rate when the status changes. Steady states (available or unavailable) would
eventually yield the least load. In the average case (availability status is steady) the
load will be reduced up to the limit. In the worst case (availability changes all the time)
the load will not be reduced compared to today. 
> The base rate would be Sling Scheduler {{scheduler.period}}. The rate at time t + 1
would be computed as follow: Rate~t+1~ = Multiplier~t+1~ * Rate~t+1~. The table below summarise
how the multiplier would evolve according to the available status change. 
> ||State~t~||State~t+1~||Multiplier~t+1~||
> |unavailable|unavailable|max(2 * Multiplier~t~, limit)|
> |unavailable|available|1|
> |available|unavailable|1|
> |available|available|max(2 * Multiplier~t~, limit)|
> The limit would be hardcoded to 16 which would reduce the load by an order of magnitude,
we could expose the limit as a configuration later if needed.
> There should be no need to randomise the multiplier for now as the checker are expected
to be started at random time. If we hit a scenario where the checkers start at the same time,
we could simply randomise the first scheduled event.

This message was sent by Atlassian Jira

View raw message