helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hang Qi <hangq.1...@gmail.com>
Subject Re: After dropResource , still able to listResourceInfo
Date Wed, 20 May 2015 22:42:03 GMT
Hi Kishore,

Fortunately, I found the zk dump file, I believe it is #2.

The paths contains the dropped resources are in the following format

/$cluster/INSTANCES/$instance/ERRORS/$sessionId/$resourceName/$partition

Thanks
Hang Qi

On Wed, May 20, 2015 at 1:41 PM, kishore g <g.kishore@gmail.com> wrote:

> Hi,
>
> Here is what is happening in the code.
>
> listClusterInfo gets the resources under /IDEALSTATE
> listResourceInfo dumps the information for Resource from
> /IDEALSTATE/<resourceName> and /EXTERNALVIEW/<resourceName>
>
> This is what happens behind the scene when we drop a resource.
>
>    - Idealstate is deleted first
>    - Controller firsts brings all partitions to their initial state
>    (OFFLINE) and then  fire OFFLINE->DROPPED state. Once the OFFLINE-DROPPED
>    state transition is successfully processed, its entry is deleted from
>    ExternalView.
>    - After all partitions handle the transitions correctly, the
>    ExternalView should become empty.
>    - Once the ExternalView is empty, controller deletes the ExternalView.
>
> If listResourceInfo is still showing the resource, it could be because of
> one of the following reasons:
>
>    1. The partitions have not yet reached DROPPED state. This should
>    ideally finish in few seconds, depending on what is done as part of
>    OFFLINE->DROPPED transition.
>    2. One of the partitions went into ERROR state. In this case, resource
>    external view will continue to read.
>    3. No controller running to delete the external view after all
>    partitions went to OFFLINE/DROPPED state.
>
> Vinod's cases is #3. Hang, do you remember if your case was #1 or #2?
>
>
> Thanks,
> Kishore G
>
>
>
> On Wed, May 20, 2015 at 1:18 PM, Hang Qi <hangq.1985@gmail.com> wrote:
>
>> No, we have dedicated controllers.
>>
>> We first created one resource, and later on we decided to create a new
>> one, and dropped the previous one. After the drop, listClusterInfo did not
>> show that resource, but we were able to listResourceInfo by the dropped
>> one. While in the application, we were still receiving callback/transition
>> for dropped resource.
>>
>> Thanks
>> Hang Qi
>>
>> On Wed, May 20, 2015 at 6:44 AM, Vinoth Chandar <vinoth@uber.com> wrote:
>>
>>> Kishore and I chatted offline. The problem seems to be that there is
>>> still an external view for the resource, which Kishore tells me exists as
>>> long as a controller comes back up. (other info: no live instances around)
>>>
>>> I am running my app with a distributed/embedded controller, which means
>>> when I shut down my instances the controller(s) died as well. I will try to
>>> reproduce this locally and report back.
>>>
>>> @Hang, does this have any similarity to your usage?
>>>
>>> On Tue, May 19, 2015 at 1:43 PM, Vinoth Chandar <vinoth@uber.com> wrote:
>>>
>>>> I did a ZK dump before I cleared everything out.. Will investigate and
>>>> send more info out..
>>>>
>>>> @Kishore, dropResource did not error out.. My memory is vague as it was
>>>> middle of the night :), but I think I shut everything down before I issued
>>>> the CLI command.
>>>>
>>>> Thanks
>>>> Vinoth
>>>>
>>>> On Tue, May 19, 2015 at 12:50 PM, Hang Qi <hangq.1985@gmail.com> wrote:
>>>>
>>>>> Hi Vinoth,
>>>>>
>>>>> We met this issue before. What we did is using zk-dumper.sh to dump
>>>>> everything inside ZK, and see where does this resource exist, and remove
>>>>> those paths in ZK, and that works.
>>>>>
>>>>> Unfortunately, we did not keep the state, so It would be great if you
>>>>> can share the paths which contains the resource you dropped, that would
be
>>>>> helpful for debugging.
>>>>>
>>>>> Thanks
>>>>> Hang Qi
>>>>>
>>>>> On Tue, May 19, 2015 at 11:10 AM, Vinoth Chandar <vinoth@uber.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I dropped the resource already, but still seeing callbacks firing..
I
>>>>>> cannot list the resource using listResources.
>>>>>>
>>>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181
>>>>>> --dropResource streamio countLog
>>>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181
>>>>>> --listResourceInfo streamio countLog | tail -10
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "NUM_PARTITIONS" : "4096",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>>>>>>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>>>>>>   }
>>>>>> }
>>>>>> $ bin/helix-admin.sh --zkSvr zkmaster:2181 --listResources streamio
|
>>>>>> grep countLog | wc -l
>>>>>> 0
>>>>>>
>>>>>> Any idea how to troubleshoot this?
>>>>>>
>>>>>> Thanks
>>>>>> Vinoth
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Qi hang
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Qi hang
>>
>
>


-- 
Qi hang

Mime
View raw message