ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Baseline auto-adjust`s discuss
Date Thu, 24 Jan 2019 18:53:44 GMT
Hi Anton,

This is great feature, but I am a bit confused about automatic disabling of
a feature during manual baseline adjustment. This may lead to unpleasant
situations when a user enabled auto-adjustment, then re-adjusted it
manually somehow (e.g. from some previously created script) so that
auto-adjustment disabling went unnoticed, then added more nodes hoping that
auto-baseline is still active, etc.

Instead, I would rather make manual and auto adjustment mutually exclusive
- baseline cannot be adjusted manually when auto mode is set, and vice
versa. If exception is thrown in that cases, administrators will always
know current behavior of the system.

As far as configuration, wouldn’t it be enough to have a single long value
as opposed to Boolean + long? Say, 0 - immediate auto adjustment, negative
- disabled, positive - auto adjustment after timeout.

Thoughts?

чт, 24 янв. 2019 г. в 18:33, Anton Kalashnikov <kaa.dev@yandex.ru>:

>
> Hello, Igniters!
>
> Work on the Phase II of IEP-4 (Baseline topology) [1] has started. I want
> to start to discuss of implementation of "Baseline auto-adjust" [2].
>
> "Baseline auto-adjust" feature implements mechanism of auto-adjust
> baseline corresponding to current topology after event join/left was
> appeared. It is required because when a node left the grid and nobody would
> change baseline manually it can lead to lost data(when some more nodes left
> the grid on depends in backup factor) but permanent tracking of grid is not
> always possible/desirible. Looks like in many cases auto-adjust baseline
> after some timeout is very helpfull.
>
> Distributed metastore[3](it is already done):
>
> First of all it is required the ability to store configuration data
> consistently and cluster-wide. Ignite doesn't have any specific API for
> such configurations and we don't want to have many similar implementations
> of the same feature in our code. After some thoughts is was proposed to
> implement it as some kind of distributed metastorage that gives the ability
> to store any data in it.
> First implementation is based on existing local metastorage API for
> persistent clusters (in-memory clusters will store data in memory).
> Write/remove operation use Discovery SPI to send updates to the cluster, it
> guarantees updates order and the fact that all existing (alive) nodes have
> handled the update message. As a way to find out which node has the latest
> data there is a "version" value of distributed metastorage, which is
> basically <number of all updates, hash of updates>. All updates history
> until some point in the past is stored along with the data, so when an
> outdated node connects to the cluster it will receive all the missing data
> and apply it locally. If there's not enough history stored or joining node
> is clear then it'll receive shapshot of distributed metastorage so there
> won't be inconsistencies.
>
> Baseline auto-adjust:
>
> Main scenario:
>         - There is grid with the baseline is equal to the current topology
>         - New node joins to grid or some node left(failed) the grid
>         - New mechanism detects this event and it add task for changing
> baseline to queue with configured timeout
>         - If new event are happened before baseline would be changed task
> would be removed from queue and new task will be added
>         - When timeout are expired the task would try to set new baseline
> corresponded to current topology
>
> First of all we need to add two parameters[4]:
>         - baselineAutoAdjustEnabled - enable/disable "Baseline
> auto-adjust" feature.
>         - baselineAutoAdjustTimeout - timeout after which baseline should
> be changed.
>
> This parameters are cluster wide and can be changed in real time because
> it is based on "Distributed metastore". On first time this parameters would
> be initiated by corresponded parameters(initBaselineAutoAdjustEnabled,
> initBaselineAutoAdjustTimeout) from "Ignite Configuration". Init value
> valid only before first changing of it after  value would be changed it is
> stored in "Distributed metastore".
>
> Restrictions:
>         - This mechanism handling events only on active grid
>         - If baselineNodes != gridNodes on activate this feature would be
> disabled
>         - If lost partitions was detected this feature would be disabled
>         - If baseline was adjusted manually on baselineNodes != gridNodes
> this feature would be disabled
>
> Draft implementation you can find here[5]. Feel free to ask more details
> and make suggestions.
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-4+Baseline+topology+for+caches
> [2] https://issues.apache.org/jira/browse/IGNITE-8571
> [3] https://issues.apache.org/jira/browse/IGNITE-10640
> [4] https://issues.apache.org/jira/browse/IGNITE-8573
> [5] https://github.com/apache/ignite/pull/5907
>
> --
> Best regards,
> Anton Kalashnikov
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message