Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 645F2F112 for ; Thu, 18 Apr 2013 06:34:51 +0000 (UTC) Received: (qmail 7484 invoked by uid 500); 18 Apr 2013 06:34:49 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 7412 invoked by uid 500); 18 Apr 2013 06:34:48 -0000 Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.incubator.apache.org Delivered-To: mailing list user@helix.incubator.apache.org Received: (qmail 7378 invoked by uid 99); 18 Apr 2013 06:34:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Apr 2013 06:34:46 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=FORGED_YAHOO_RCVD,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.91.54] (HELO nm6-vm0.bullet.mail.ne1.yahoo.com) (98.138.91.54) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Apr 2013 06:34:39 +0000 Received: from [98.138.226.179] by nm6.bullet.mail.ne1.yahoo.com with NNFMP; 18 Apr 2013 06:34:19 -0000 Received: from [98.136.44.62] by tm14.bullet.mail.ne1.yahoo.com with NNFMP; 18 Apr 2013 06:34:19 -0000 Received: from [127.0.0.1] by smtp107.prem.mail.sp1.yahoo.com with NNFMP; 18 Apr 2013 06:34:18 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1366266858; bh=AICLu8tjfM/XHePj9wzybABK3fAkDHMR1SGC+G9Ze2c=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=AWeW/DGvcmKeF8uAWzklozLgi6GVN5+5wO4GCiKjF1DofnB+6+gmQ69U46addP8Pj7KuW2U0cnkW7eivgtrt57TsCweMdmetO5rHfFfdPeftQfrnClV4uOtjYxlEHq/5HhQPW4s+rdvAI+qQyBSRqhZqUmFpcetdPLq+IUgaw6c= X-Yahoo-Newman-Id: 946321.46861.bm@smtp107.prem.mail.sp1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 4v0lLDgVM1lXoRpj61xAJnjAkZyVxCRYayPr6kpXxzqgbop zH7vpemCIqf4hvNjVINi_8GQE3U8VD9MVLe4IyKyQLWsPXr5brUVnma5DDXm 79ciVu_Zys9tX6atAc4VbDZAB26Ax2TgKrQHooR_2Hhd8LQzKQkLRSQY.L.w iDI27SUYtjq1mletgzF4_aRBzAkv6sfaPZbXFQYOXdUGXOgYqPCqEih9pwK3 H7xwf_1jg.bsMp7w0JSYFzCvzKcrQbsvAS9DfNl5jjOUdCONr5hjUGM3uAV2 EcQbparL9jtT2jubXMRsi.nOPKEIx58lp4Uj3SrMmAZNvU_Z4eDGJnFa3uEE TPcOdmV5tbeyNys6NSnGI4UtKMv778vitX01X4a.ZS0h57POu1xh65WMdb6w 4zvKOBWtESiWQa9S2n9QfUOOS3l1GppVpacQmLFtNrdheeBpb2e7xSBkvS.. _5xD8.hY- X-Yahoo-SMTP: ZvR6Aa.swBBi9aze9_P4M914Ag-- X-Rocket-Received: from netbook.local (vborky@76.103.130.241 with plain) by smtp107.prem.mail.sp1.yahoo.com with SMTP; 17 Apr 2013 23:34:18 -0700 PDT Message-ID: <516F93EA.30007@yahoo.com> Date: Wed, 17 Apr 2013 23:34:18 -0700 From: Vinayak Borkar User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: user@helix.incubator.apache.org Subject: Re: Resource Partition Failure References: <516F767F.80803@yahoo.com> <516F8A42.6000500@yahoo.com> <72059E7F-84D7-495E-8898-5D41F1331291@mac.com> <516F8C15.4030407@yahoo.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Kishore, Thanks for the explanation. I saw that HelixAdmin had calls to reset partitions from error state -> initial state. So I was wondering if moving the partition to error state by the instance itself would be a good idea. But Ming's answer and your explanation obviate the need for that. Thanks, Vinayak On 4/17/13 11:29 PM, kishore g wrote: > Ming is correct, you can use the enablePartition(false) to disable only the > corrupted partition on the node. This will trigger the rebalancer which > recomputes the ideal state. > > We thought about allowing instance to move itself into ERROR state but we > were worried that giving control to instance to change its state > automatically is dangerous and makes it harder to debug issues. > > We do have a mechanism for the participant to send a request to controller > to initiate a transition for example you can send a message to controller > to disable a partition/instance. ( This is different from disabling using > helix admin but though the end result is the same). > > I dint get the second part " which was then reset by possibly the > controller" > > > > > On Wed, Apr 17, 2013 at 11:00 PM, Vinayak Borkar wrote: > >> That sounds more promising. Does disabling a partition trigger ideal state >> computation to rebalance the cluster? >> >> Ideally it would be great if the corrupted instance could move itself to >> the ERROR state which was then reset by possibly the controller. Is that >> possible? >> >> >> >> >> >> On 4/17/13 10:55 PM, Ming Fang wrote: >> >>> how about HelixAdmin.enablePartition()? >>> >>> On Apr 18, 2013, at 1:53 AM, Vinayak Borkar wrote: >>> >>> Hi Ming Fang, >>>> >>>> >>>> Enable/Disable instance will take out all the resources hosted on an >>>> instance. I would like to disable only the corrupted partition on the >>>> system without impacting other resources. >>>> >>>> Thanks, >>>> Vinayak >>>> >>>> >>>> On 4/17/13 10:43 PM, Ming Fang wrote: >>>> >>>>> Try HelixAdmin.enableInstance() >>>>> >>>>> On Apr 18, 2013, at 12:28 AM, Vinayak Borkar wrote: >>>>> >>>>> Hi, >>>>>> >>>>>> >>>>>> What is the expected way for a system to indicate to Helix that a >>>>>> partition of a resource has failed? >>>>>> >>>>>> Say the bits on disk of a particular partition are found to be >>>>>> corrupted. Is there a way to tell helix that that partition of that >>>>>> resource needs to "fail" without killing the whole node and hence >>>>>> destroying all other resources on that machine? >>>>>> >>>>>> Thanks, >>>>>> Vinayak >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> >