ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled
Date Thu, 12 Apr 2018 23:14:44 GMT
Pavel, thanks for the suggestions. They would definitely work out. I would
document the one with the event subscription:
https://issues.apache.org/jira/browse/IGNITE-8241

Could you help preparing a sample code snippet with such a listener that
will be added to the doc? I know that there are some caveats related to the
way how such an event has to be processed.

Ivan, truly like your idea. Alex G., what's your thought on this?

--
Denis

On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov <ivan.glukos@gmail.com> wrote:

> Guys,
>
> I also heard complaints about absence of option to automatically change
> baseline topology. They absolutely make sense.
> What Pavel suggested will work as a workaround. I think, in future
> releases we should give user an option to enable a similar behavior via
> Ignite Configuration.
> It may be called "Baseline Topology change policy". I see it as rule-based
> language, which allows to specify conditions of BLT change using several
> parameters - timeout and minimum allowed number of partition copies left
> (maybe this option should be provided also on per-cache-group level).
> Policy can also specify conditions for including new nodes in BLT if they
> are present - including node attributes filters and so on.
>
> What do you think?
>
> Best Regards,
> Ivan Rakov
>
>
> On 12.04.2018 19:41, Pavel Kovalenko wrote:
>
>> Denis,
>>
>> It's just one of the ways to implement it. We also can subscribe on node
>> join / fail events to properly track downtime of a node.
>>
>> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko <jokserfn@gmail.com>:
>>
>> Denis,
>>>
>>> Using our API we can implement this task as follows:
>>> Do each minute:
>>> 1) Get all alive server nodes consistent ids =>
>>> ignite().context().discovery().aliveServerNodes() =>
>>> mapToConsistentIds().
>>> 2) Get current baseline topology => ignite().cluster().
>>> currentBaselineTopology()
>>> 3) For each node in baseline and not in alive server nodes check timeout
>>> for this node.
>>> 4) If timeout is reached remove node from baseline
>>> 5) If baseline is changed set new baseline => ignite().cluster().
>>> setNewBaseline()
>>>
>>>
>>> 2018-04-12 2:18 GMT+03:00 Denis Magda <dmagda@apache.org>:
>>>
>>> Pavel, Val,
>>>>
>>>> So, it means that the rebalancing will be initiated only after an
>>>> administrator remove the failed node from the topology, right?
>>>>
>>>> Next, imagine that you are that IT administrator who has to automate the
>>>> rebalancing activation if the node failed and not recovered within 1
>>>> minute. What would you do and what Ignite provides to fulfill the task?
>>>>
>>>> --
>>>> Denis
>>>>
>>>> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <jokserfn@gmail.com>
>>>> wrote:
>>>>
>>>> Denis,
>>>>>
>>>>> In case of incomplete baseline topology IgniteCache.rebalance() will
do
>>>>> nothing, because this event doesn't trigger partitions exchange or
>>>>>
>>>> affinity
>>>>
>>>>> change, so states of existing partitions are hold.
>>>>>
>>>>> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
>>>>> valentin.kulichenko@gmail.com>:
>>>>>
>>>>> Denis,
>>>>>>
>>>>>> In my understanding, in this case you should remove node from BLT
and
>>>>>>
>>>>> that
>>>>>
>>>>>> will trigger the rebalancing, no?
>>>>>>
>>>>>> -Val
>>>>>>
>>>>>> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <dmagda@gridgain.com>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>> Igniters,
>>>>>>>
>>>>>>> As we know the rebalancing doesn't happen if one of the nodes
goes
>>>>>>>
>>>>>> down,
>>>>>
>>>>>> thus, shrinking the baseline topology. It complies with our
>>>>>>>
>>>>>> assumption
>>>>
>>>>> that
>>>>>>
>>>>>>> the node should be recovered soon and there is no need to waste
>>>>>>> CPU/memory/networking resources of the cluster shifting the data
>>>>>>>
>>>>>> around.
>>>>>
>>>>>> However, there are always edge cases. I was reasonably asked how
to
>>>>>>>
>>>>>> trigger
>>>>>>
>>>>>>> the rebalancing within the baseline topology manually or on timeout
>>>>>>>
>>>>>> if:
>>>>
>>>>>     - It's not expected that the failed node would be resurrected in
>>>>>>>
>>>>>> the
>>>>
>>>>>     nearest time and
>>>>>>>     - It's not likely that that node will be replaced by the
other
>>>>>>>
>>>>>> one.
>>>>
>>>>> The question. If I call IgniteCache.rebalance() or configure
>>>>>>> CacheConfiguration.rebalanceTimeout will the rebalancing be fired
>>>>>>>
>>>>>> within
>>>>>
>>>>>> the baseline topology?
>>>>>>>
>>>>>>> --
>>>>>>> Denis
>>>>>>>
>>>>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message