lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (SOLR-11320) Lock autoscaling triggers when changes they requested are being made
Date Thu, 26 Oct 2017 14:37:00 GMT


ASF subversion and git services commented on SOLR-11320:

Commit ed611a085134df9257bc6ac6ba4bef37ff3b514a in lucene-solr's branch refs/heads/master
from [~ab]
[;h=ed611a0 ]

SOLR-11320: Lock autoscaling triggers when changes they requested are being made.

> Lock autoscaling triggers when changes they requested are being made
> --------------------------------------------------------------------
>                 Key: SOLR-11320
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: AutoScaling
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
> Autoscaling triggers generate events that are then processed by actions such as ComputePlanAction
and ExecutePlanAction. This process is far from instantaneous - it may take sometimes several
seconds or even minutes to eg. move or add replicas.
> The original condition that caused the first event will usually persist during this time,
and eventually after {{waitFor}} time elapsed it will lead to a new event being generated,
which will be queued for execution once the previous actions are completed - but by that time
the original condition may have been alleviated by these actions, and the conditions reported
in the new event no longer reflect the latest cluster state.
> For this reason some autoscaling frameworks introduce a "cooldown" period, where triggers
are temporarily disabled for a fixed period of time to avoid piling up new events while cluster
changes are being made. This method introduces a fixed delay that is specific to a trigger.
> From the point of view of control theory the feedback loop design should minimize inherent
delays because they are very hard to properly compensate for and either lead to instability
(when controller tries to compensate for an out-of-step state) or lead to increased system
lag (the system sluggishly reacts to changes because it has to wait for things to settle down)
- so from this point of view a fixed delay, which is also hard to estimate properly and may
be inadequate depending on varying conditions, is not ideal.
> A better alternative would be to lock the trigger just for the actual duration of time
while changes are being made. Initially this could be implemented as a global lock for all
triggers for the duration of modifications performed by ExecutePlanAction.
> Currently cluster modifications executed by ExecutePlanAction are made asynchronously,
so it's hard to determine when the changes actually take effect, eg. when a new (or moved)
replica becomes active, so this would have to be changed as well.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message