helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinoth Chandar <vin...@uber.com>
Subject Re: After dropResource , still able to listResourceInfo
Date Fri, 22 May 2015 16:57:30 GMT
For me, I think this is #1. Everything is in OFFLINE state. So,
dropResource should be done with the cluster up and running?  (given we
have embedded controllers?)

On Wed, May 20, 2015 at 3:42 PM, Hang Qi <hangq.1985@gmail.com> wrote:

> Hi Kishore,
>
> Fortunately, I found the zk dump file, I believe it is #2.
>
> The paths contains the dropped resources are in the following format
>
> /$cluster/INSTANCES/$instance/ERRORS/$sessionId/$resourceName/$partition
>
> Thanks
> Hang Qi
>
> On Wed, May 20, 2015 at 1:41 PM, kishore g <g.kishore@gmail.com> wrote:
>
>> Hi,
>>
>> Here is what is happening in the code.
>>
>> listClusterInfo gets the resources under /IDEALSTATE
>> listResourceInfo dumps the information for Resource from
>> /IDEALSTATE/<resourceName> and /EXTERNALVIEW/<resourceName>
>>
>> This is what happens behind the scene when we drop a resource.
>>
>>    - Idealstate is deleted first
>>    - Controller firsts brings all partitions to their initial state
>>    (OFFLINE) and then  fire OFFLINE->DROPPED state. Once the OFFLINE-DROPPED
>>    state transition is successfully processed, its entry is deleted from
>>    ExternalView.
>>    - After all partitions handle the transitions correctly, the
>>    ExternalView should become empty.
>>    - Once the ExternalView is empty, controller deletes the ExternalView.
>>
>> If listResourceInfo is still showing the resource, it could be because of
>> one of the following reasons:
>>
>>    1. The partitions have not yet reached DROPPED state. This should
>>    ideally finish in few seconds, depending on what is done as part of
>>    OFFLINE->DROPPED transition.
>>    2. One of the partitions went into ERROR state. In this case,
>>    resource external view will continue to read.
>>    3. No controller running to delete the external view after all
>>    partitions went to OFFLINE/DROPPED state.
>>
>> Vinod's cases is #3. Hang, do you remember if your case was #1 or #2?
>>
>>
>> Thanks,
>> Kishore G
>>
>>
>>
>> On Wed, May 20, 2015 at 1:18 PM, Hang Qi <hangq.1985@gmail.com> wrote:
>>
>>> No, we have dedicated controllers.
>>>
>>> We first created one resource, and later on we decided to create a new
>>> one, and dropped the previous one. After the drop, listClusterInfo did not
>>> show that resource, but we were able to listResourceInfo by the dropped
>>> one. While in the application, we were still receiving callback/transition
>>> for dropped resource.
>>>
>>> Thanks
>>> Hang Qi
>>>
>>> On Wed, May 20, 2015 at 6:44 AM, Vinoth Chandar <vinoth@uber.com> wrote:
>>>
>>>> Kishore and I chatted offline. The problem seems to be that there is
>>>> still an external view for the resource, which Kishore tells me exists as
>>>> long as a controller comes back up. (other info: no live instances around)
>>>>
>>>> I am running my app with a distributed/embedded controller, which means
>>>> when I shut down my instances the controller(s) died as well. I will try
to
>>>> reproduce this locally and report back.
>>>>
>>>> @Hang, does this have any similarity to your usage?
>>>>
>>>> On Tue, May 19, 2015 at 1:43 PM, Vinoth Chandar <vinoth@uber.com>
>>>> wrote:
>>>>
>>>>> I did a ZK dump before I cleared everything out.. Will investigate and
>>>>> send more info out..
>>>>>
>>>>> @Kishore, dropResource did not error out.. My memory is vague as it
>>>>> was middle of the night :), but I think I shut everything down before
I
>>>>> issued the CLI command.
>>>>>
>>>>> Thanks
>>>>> Vinoth
>>>>>
>>>>> On Tue, May 19, 2015 at 12:50 PM, Hang Qi <hangq.1985@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Vinoth,
>>>>>>
>>>>>> We met this issue before. What we did is using zk-dumper.sh to dump
>>>>>> everything inside ZK, and see where does this resource exist, and
remove
>>>>>> those paths in ZK, and that works.
>>>>>>
>>>>>> Unfortunately, we did not keep the state, so It would be great if
you
>>>>>> can share the paths which contains the resource you dropped, that
would be
>>>>>> helpful for debugging.
>>>>>>
>>>>>> Thanks
>>>>>> Hang Qi
>>>>>>
>>>>>> On Tue, May 19, 2015 at 11:10 AM, Vinoth Chandar <vinoth@uber.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I dropped the resource already, but still seeing callbacks firing..
>>>>>>> I cannot list the resource using listResources.
>>>>>>>
>>>>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181
>>>>>>> --dropResource streamio countLog
>>>>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181
>>>>>>> --listResourceInfo streamio countLog | tail -10
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "NUM_PARTITIONS" : "4096",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>>>>>>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>>>>>>   }
>>>>>>> }
>>>>>>> $ bin/helix-admin.sh --zkSvr zkmaster:2181 --listResources streamio
>>>>>>> | grep countLog | wc -l
>>>>>>> 0
>>>>>>>
>>>>>>> Any idea how to troubleshoot this?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Vinoth
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Qi hang
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Qi hang
>>>
>>
>>
>
>
> --
> Qi hang
>

Mime
View raw message