helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: After dropResource , still able to listResourceInfo
Date Wed, 20 May 2015 20:41:21 GMT
Hi,

Here is what is happening in the code.

listClusterInfo gets the resources under /IDEALSTATE
listResourceInfo dumps the information for Resource from
/IDEALSTATE/<resourceName> and /EXTERNALVIEW/<resourceName>

This is what happens behind the scene when we drop a resource.

   - Idealstate is deleted first
   - Controller firsts brings all partitions to their initial state
   (OFFLINE) and then  fire OFFLINE->DROPPED state. Once the OFFLINE-DROPPED
   state transition is successfully processed, its entry is deleted from
   ExternalView.
   - After all partitions handle the transitions correctly, the
   ExternalView should become empty.
   - Once the ExternalView is empty, controller deletes the ExternalView.

If listResourceInfo is still showing the resource, it could be because of
one of the following reasons:

   1. The partitions have not yet reached DROPPED state. This should
   ideally finish in few seconds, depending on what is done as part of
   OFFLINE->DROPPED transition.
   2. One of the partitions went into ERROR state. In this case, resource
   external view will continue to read.
   3. No controller running to delete the external view after all
   partitions went to OFFLINE/DROPPED state.

Vinod's cases is #3. Hang, do you remember if your case was #1 or #2?


Thanks,
Kishore G



On Wed, May 20, 2015 at 1:18 PM, Hang Qi <hangq.1985@gmail.com> wrote:

> No, we have dedicated controllers.
>
> We first created one resource, and later on we decided to create a new
> one, and dropped the previous one. After the drop, listClusterInfo did not
> show that resource, but we were able to listResourceInfo by the dropped
> one. While in the application, we were still receiving callback/transition
> for dropped resource.
>
> Thanks
> Hang Qi
>
> On Wed, May 20, 2015 at 6:44 AM, Vinoth Chandar <vinoth@uber.com> wrote:
>
>> Kishore and I chatted offline. The problem seems to be that there is
>> still an external view for the resource, which Kishore tells me exists as
>> long as a controller comes back up. (other info: no live instances around)
>>
>> I am running my app with a distributed/embedded controller, which means
>> when I shut down my instances the controller(s) died as well. I will try to
>> reproduce this locally and report back.
>>
>> @Hang, does this have any similarity to your usage?
>>
>> On Tue, May 19, 2015 at 1:43 PM, Vinoth Chandar <vinoth@uber.com> wrote:
>>
>>> I did a ZK dump before I cleared everything out.. Will investigate and
>>> send more info out..
>>>
>>> @Kishore, dropResource did not error out.. My memory is vague as it was
>>> middle of the night :), but I think I shut everything down before I issued
>>> the CLI command.
>>>
>>> Thanks
>>> Vinoth
>>>
>>> On Tue, May 19, 2015 at 12:50 PM, Hang Qi <hangq.1985@gmail.com> wrote:
>>>
>>>> Hi Vinoth,
>>>>
>>>> We met this issue before. What we did is using zk-dumper.sh to dump
>>>> everything inside ZK, and see where does this resource exist, and remove
>>>> those paths in ZK, and that works.
>>>>
>>>> Unfortunately, we did not keep the state, so It would be great if you
>>>> can share the paths which contains the resource you dropped, that would be
>>>> helpful for debugging.
>>>>
>>>> Thanks
>>>> Hang Qi
>>>>
>>>> On Tue, May 19, 2015 at 11:10 AM, Vinoth Chandar <vinoth@uber.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I dropped the resource already, but still seeing callbacks firing.. I
>>>>> cannot list the resource using listResources.
>>>>>
>>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181
>>>>> --dropResource streamio countLog
>>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181
>>>>> --listResourceInfo streamio countLog | tail -10
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "NUM_PARTITIONS" : "4096",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>>>>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>>>>   }
>>>>> }
>>>>> $ bin/helix-admin.sh --zkSvr zkmaster:2181 --listResources streamio |
>>>>> grep countLog | wc -l
>>>>> 0
>>>>>
>>>>> Any idea how to troubleshoot this?
>>>>>
>>>>> Thanks
>>>>> Vinoth
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Qi hang
>>>>
>>>
>>>
>>
>
>
> --
> Qi hang
>

Mime
View raw message