helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sesh Jalagam <sjala...@box.com>
Subject Re: Messages building up in helix
Date Mon, 28 Nov 2016 22:20:42 GMT
I like the last idea.

When an instance is shutting down today it does following
- disable instance
- drop instance

Instead I can do
When instance shuts down
- disable instance

Reaper thread on the controller (this can be present on any of the current
live instances) wakes up look for all disabled instances and drop the
instance.

- Sesh .J


On Mon, Nov 28, 2016 at 2:06 PM, kishore g <g.kishore@gmail.com> wrote:

> If you know that instance will never come back up with the same name. You
> can do the following
>
> - disable the instance
> - wait for all partitions hosted by this instance to get OFFLINE/DROPPED
> state.
> - disconnect from the cluster
> - Use ZkHelixAdmin to drop the instance from the cluster. This should
> clean up everything related to the old node.
>
> You can also do this via the controller node. watch for liveinstances and
> if nodes are not present under liveinstances you can delete those nodes.
> One suggestion here is - when a node shutsdown, write the state to
> instanceConfig of that node say STATE="SHUTDOWN". Your reaper thread can
> look for nodes that are in this state and invoke admin.dropInstance.
>
> dropInstance will take care of cleaning up everything related to a dead
> node.
>
>
>
>
> On Mon, Nov 28, 2016 at 1:56 PM, Sesh Jalagam <sjalagam@box.com> wrote:
>
>> Kishore thanks,
>>
>> Option 1 and Option 3 are plausible. Option 2 is not feasible, even
>> though the cluster name is same, instance name is different (usually this a
>> random value)
>>
>> With Option 1 what should I be looking in the External View, should I be
>> looking at all the resources that should have been transitioned off.
>>
>> With Option 3, when a cluster is redeployed the controller is moving
>> around (because of leader election) from old nodes to old nodes, so I
>> wonder if the controller will miss any messages for dead nodes. Are I can
>> simply have a reaper that comes up and deletes all messages that are
>> destined for instances that are not present in /LIVEINSTANCES/.
>>
>> How should I be dealing with <cluster_id>INSTANCES/INSTANCES/CURRENTSTATES
>> this has stale current states ( session id that is not valid).
>>
>>
>>
>> On Mon, Nov 28, 2016 at 12:52 PM, kishore g <g.kishore@gmail.com> wrote:
>>
>>> Looks like nodes add and remove themselves quite often. After you
>>> disable the instance, Helix will send messages to go from ONLINE to
>>> OFFLINE. Looks like the nodes shut down before they get those messages and
>>> when they come back up, they use a different instance id.
>>>
>>> There are two solutions
>>> - During shut down - after disabling wait for the state to be reflected
>>> in the External View.
>>> - During start up - If possible, re-join the cluster with the same name.
>>> If you do that, Helix will remove old messages.
>>>
>>> A third option is to support autoCleanUp in Helix. Helix controller can
>>> monitor the cluster for dead nodes and remove them automatically after some
>>> time.
>>>
>>>
>>>
>>> On Mon, Nov 28, 2016 at 12:39 PM, Sesh Jalagam <sjalagam@box.com> wrote:
>>>
>>>> <clustername>/INSTANCES/INSTANCES/MESSAGES has already read messages.
>>>>
>>>> Here is an example.
>>>>     ,"FROM_STATE":"ONLINE"
>>>>     ,"MSG_STATE":"read"
>>>>     ,"MSG_TYPE":"STATE_TRANSITION"
>>>>     ,"STATE_MODEL_DEF":"OnlineOffline"
>>>>     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
>>>>     ,"TO_STATE":"OFFLINE
>>>>
>>>> I see these messages after the participant is disabled and dropped i.e
>>>>  <clustername>/INSTANCES/<PARTICIPANT_ID> is removed.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Mon, Nov 28, 2016 at 12:18 PM, kishore g <g.kishore@gmail.com>
>>>> wrote:
>>>>
>>>>> <clustername>/INSTANCES/INSTANCES/MESSAGES by this do you mean
>>>>> <clustername>/INSTANCES/<PARTICIPANT_ID>/MESSAGES
>>>>>
>>>>> What kind of messages do you see under these nodes.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 28, 2016 at 12:04 PM, Sesh Jalagam <sjalagam@box.com>
>>>>> wrote:
>>>>>
>>>>>> Our set up is following.
>>>>>>
>>>>>> - Controller (leader elected from one of the cluster nodes)
>>>>>>
>>>>>> - Cluster of nodes as participants in OnlineOffline StateModel
>>>>>>
>>>>>> - Set of resources with partitions.
>>>>>>
>>>>>>
>>>>>> Each node on its startup, creates a controller adds a participant
if
>>>>>> its not existing and waits for the callbacks to handle partition
>>>>>> rebalancing.
>>>>>>
>>>>>> Please not this cluster is created on the fly multiple times a day
>>>>>> (actual cluster is not deleted, but new participants are removed
and
>>>>>> re-added)
>>>>>>
>>>>>>
>>>>>> Everything works fine in production, but I see that the znodes
>>>>>> in <clustername>/INSTANCES/INSTANCES/MESSAGES is growing.
>>>>>>
>>>>>> What is <cluster_id>/INSTANCES/INSTANCES used for, is there
a way
>>>>>> for the messages to be deleted automatically.
>>>>>>
>>>>>> I see similar buildup in <cluster_id>INSTANCES/INSTANCE
>>>>>> S/CURRENTSTATES.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> --
>>>>>> - Sesh .J
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> - Sesh .J
>>>>
>>>
>>>
>>
>>
>> --
>> - Sesh .J
>>
>
>


-- 
- Sesh .J

Mime
View raw message