helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Resource Partition Failure
Date Thu, 18 Apr 2013 06:29:05 GMT
Ming is correct, you can use the enablePartition(false) to disable only the
corrupted partition on the node. This will trigger the rebalancer which
recomputes the ideal state.

We thought about allowing instance to move itself into ERROR state but we
were worried that giving control to instance to change its state
automatically is dangerous and makes it harder to debug issues.

We do have a mechanism for the participant to send a request to controller
to initiate a transition for example you can send a message to controller
to disable a partition/instance. ( This is different from disabling using
helix admin but though the end result is the same).

I dint get the second part " which was then reset by possibly the
controller"




On Wed, Apr 17, 2013 at 11:00 PM, Vinayak Borkar <vborky@yahoo.com> wrote:

> That sounds more promising. Does disabling a partition trigger ideal state
> computation to rebalance the cluster?
>
> Ideally it would be great if the corrupted instance could move itself to
> the ERROR state which was then reset by possibly the controller. Is that
> possible?
>
>
>
>
>
> On 4/17/13 10:55 PM, Ming Fang wrote:
>
>> how about HelixAdmin.enablePartition()?
>>
>> On Apr 18, 2013, at 1:53 AM, Vinayak Borkar <vborky@yahoo.com> wrote:
>>
>>  Hi Ming Fang,
>>>
>>>
>>> Enable/Disable instance will take out all the resources hosted on an
>>> instance. I would like to disable only the corrupted partition on the
>>> system without impacting other resources.
>>>
>>> Thanks,
>>> Vinayak
>>>
>>>
>>> On 4/17/13 10:43 PM, Ming Fang wrote:
>>>
>>>> Try HelixAdmin.enableInstance()
>>>>
>>>> On Apr 18, 2013, at 12:28 AM, Vinayak Borkar <vborky@yahoo.com> wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>>
>>>>> What is the expected way for a system to indicate to Helix that a
>>>>> partition of a resource has failed?
>>>>>
>>>>> Say the bits on disk of a particular partition are found to be
>>>>> corrupted. Is there a way to tell helix that that partition of that
>>>>> resource needs to "fail" without killing the whole node and hence
>>>>> destroying all other resources on that machine?
>>>>>
>>>>> Thanks,
>>>>> Vinayak
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Mime
View raw message