helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayak Borkar <vbo...@yahoo.com>
Subject Re: Resource Partition Failure
Date Thu, 18 Apr 2013 06:34:18 GMT
Kishore,

Thanks for the explanation. I saw that HelixAdmin had calls to reset 
partitions from error state -> initial state. So I was wondering if 
moving the partition to error state by the instance itself would be a 
good idea. But Ming's answer and your explanation obviate the need for that.


Thanks,
Vinayak


On 4/17/13 11:29 PM, kishore g wrote:
> Ming is correct, you can use the enablePartition(false) to disable only the
> corrupted partition on the node. This will trigger the rebalancer which
> recomputes the ideal state.
>
> We thought about allowing instance to move itself into ERROR state but we
> were worried that giving control to instance to change its state
> automatically is dangerous and makes it harder to debug issues.
>
> We do have a mechanism for the participant to send a request to controller
> to initiate a transition for example you can send a message to controller
> to disable a partition/instance. ( This is different from disabling using
> helix admin but though the end result is the same).
>
> I dint get the second part " which was then reset by possibly the
> controller"
>
>
>
>
> On Wed, Apr 17, 2013 at 11:00 PM, Vinayak Borkar <vborky@yahoo.com> wrote:
>
>> That sounds more promising. Does disabling a partition trigger ideal state
>> computation to rebalance the cluster?
>>
>> Ideally it would be great if the corrupted instance could move itself to
>> the ERROR state which was then reset by possibly the controller. Is that
>> possible?
>>
>>
>>
>>
>>
>> On 4/17/13 10:55 PM, Ming Fang wrote:
>>
>>> how about HelixAdmin.enablePartition()?
>>>
>>> On Apr 18, 2013, at 1:53 AM, Vinayak Borkar <vborky@yahoo.com> wrote:
>>>
>>>   Hi Ming Fang,
>>>>
>>>>
>>>> Enable/Disable instance will take out all the resources hosted on an
>>>> instance. I would like to disable only the corrupted partition on the
>>>> system without impacting other resources.
>>>>
>>>> Thanks,
>>>> Vinayak
>>>>
>>>>
>>>> On 4/17/13 10:43 PM, Ming Fang wrote:
>>>>
>>>>> Try HelixAdmin.enableInstance()
>>>>>
>>>>> On Apr 18, 2013, at 12:28 AM, Vinayak Borkar <vborky@yahoo.com>
wrote:
>>>>>
>>>>>   Hi,
>>>>>>
>>>>>>
>>>>>> What is the expected way for a system to indicate to Helix that a
>>>>>> partition of a resource has failed?
>>>>>>
>>>>>> Say the bits on disk of a particular partition are found to be
>>>>>> corrupted. Is there a way to tell helix that that partition of that
>>>>>> resource needs to "fail" without killing the whole node and hence
>>>>>> destroying all other resources on that machine?
>>>>>>
>>>>>> Thanks,
>>>>>> Vinayak
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>


Mime
View raw message