Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@helix.incubator.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <516F93EA.30007@yahoo.com>
Date: Wed, 17 Apr 2013 23:34:18 -0700
From: Vinayak Borkar <vborky@yahoo.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: user@helix.incubator.apache.org
Subject: Re: Resource Partition Failure
References: <516F767F.80803@yahoo.com>
 <BFA23070-3792-4595-A2EF-B02231C5059E@mac.com> <516F8A42.6000500@yahoo.com>
 <72059E7F-84D7-495E-8898-5D41F1331291@mac.com> <516F8C15.4030407@yahoo.com>
 <CABaj-Qa8X5HW8Yozm0A+DT7B6CUZbiNTKXY4oi9ot+nVU9Er2Q@mail.gmail.com>
In-Reply-To: 
 <CABaj-Qa8X5HW8Yozm0A+DT7B6CUZbiNTKXY4oi9ot+nVU9Er2Q@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Kishore,

Thanks for the explanation. I saw that HelixAdmin had calls to reset 
partitions from error state -> initial state. So I was wondering if 
moving the partition to error state by the instance itself would be a 
good idea. But Ming's answer and your explanation obviate the need for that.


Thanks,
Vinayak


On 4/17/13 11:29 PM, kishore g wrote:
> Ming is correct, you can use the enablePartition(false) to disable only the
> corrupted partition on the node. This will trigger the rebalancer which
> recomputes the ideal state.
>
> We thought about allowing instance to move itself into ERROR state but we
> were worried that giving control to instance to change its state
> automatically is dangerous and makes it harder to debug issues.
>
> We do have a mechanism for the participant to send a request to controller
> to initiate a transition for example you can send a message to controller
> to disable a partition/instance. ( This is different from disabling using
> helix admin but though the end result is the same).
>
> I dint get the second part " which was then reset by possibly the
> controller"
>
>
>
>
> On Wed, Apr 17, 2013 at 11:00 PM, Vinayak Borkar <vborky@yahoo.com> wrote:
>
>> That sounds more promising. Does disabling a partition trigger ideal state
>> computation to rebalance the cluster?
>>
>> Ideally it would be great if the corrupted instance could move itself to
>> the ERROR state which was then reset by possibly the controller. Is that
>> possible?
>>
>>
>>
>>
>>
>> On 4/17/13 10:55 PM, Ming Fang wrote:
>>
>>> how about HelixAdmin.enablePartition()?
>>>
>>> On Apr 18, 2013, at 1:53 AM, Vinayak Borkar <vborky@yahoo.com> wrote:
>>>
>>>   Hi Ming Fang,
>>>>
>>>>
>>>> Enable/Disable instance will take out all the resources hosted on an
>>>> instance. I would like to disable only the corrupted partition on the
>>>> system without impacting other resources.
>>>>
>>>> Thanks,
>>>> Vinayak
>>>>
>>>>
>>>> On 4/17/13 10:43 PM, Ming Fang wrote:
>>>>
>>>>> Try HelixAdmin.enableInstance()
>>>>>
>>>>> On Apr 18, 2013, at 12:28 AM, Vinayak Borkar <vborky@yahoo.com> wrote:
>>>>>
>>>>>   Hi,
>>>>>>
>>>>>>
>>>>>> What is the expected way for a system to indicate to Helix that a
>>>>>> partition of a resource has failed?
>>>>>>
>>>>>> Say the bits on disk of a particular partition are found to be
>>>>>> corrupted. Is there a way to tell helix that that partition of that
>>>>>> resource needs to "fail" without killing the whole node and hence
>>>>>> destroying all other resources on that machine?
>>>>>>
>>>>>> Thanks,
>>>>>> Vinayak
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>