Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01907173BB for ; Wed, 20 May 2015 20:42:26 +0000 (UTC) Received: (qmail 66047 invoked by uid 500); 20 May 2015 20:42:25 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 65999 invoked by uid 500); 20 May 2015 20:42:25 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 65989 invoked by uid 99); 20 May 2015 20:42:25 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 May 2015 20:42:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 38652C11B9 for ; Wed, 20 May 2015 20:42:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id HA3plE0eOevO for ; Wed, 20 May 2015 20:42:14 +0000 (UTC) Received: from mail-vn0-f45.google.com (mail-vn0-f45.google.com [209.85.216.45]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id C207940E1C for ; Wed, 20 May 2015 20:42:13 +0000 (UTC) Received: by vnbg129 with SMTP id g129so4526475vnb.11 for ; Wed, 20 May 2015 13:41:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=EZJ+6NAcW23vBOXU/FayylF+E0ZQhRaS4KLJ0GqpSQ8=; b=ZtYYm1HiicyspjFVLchBn11KMrFZe53I2e00++BQrVC/bqRmoBDROHPnb1xgks/loU enogYQeKQjB8xT6vNHy8pqATO8XyWu6njc0Xvl5Nc1AlWk/WDoiyHhH4jUHzzfS7a7V+ JT2ZZP2Gnxef8vEr7Z2z1r+lwTOPQUcU+RpydUPcFVZJqUCal/J8Uyfzdrk6LPdlDyLu BwsayFXm/r6rZL+ijLJKJiUo9+d0HLKyGL1NVLAO90eHRGkBwFaVUSKWhSHTZBGpE/4A LNclvLKM1cWYLKQwuVUfpR1SIQcQXIOIc7oOaw0NXY8kWUZFm3r3Ps5s1wuP89OaBABG qFZQ== MIME-Version: 1.0 X-Received: by 10.52.98.105 with SMTP id eh9mr2740596vdb.56.1432154481655; Wed, 20 May 2015 13:41:21 -0700 (PDT) Received: by 10.52.24.234 with HTTP; Wed, 20 May 2015 13:41:21 -0700 (PDT) In-Reply-To: References: Date: Wed, 20 May 2015 13:41:21 -0700 Message-ID: Subject: Re: After dropResource , still able to listResourceInfo From: kishore g To: "user@helix.apache.org" Content-Type: multipart/alternative; boundary=20cf307f319e2484e30516897352 --20cf307f319e2484e30516897352 Content-Type: text/plain; charset=UTF-8 Hi, Here is what is happening in the code. listClusterInfo gets the resources under /IDEALSTATE listResourceInfo dumps the information for Resource from /IDEALSTATE/ and /EXTERNALVIEW/ This is what happens behind the scene when we drop a resource. - Idealstate is deleted first - Controller firsts brings all partitions to their initial state (OFFLINE) and then fire OFFLINE->DROPPED state. Once the OFFLINE-DROPPED state transition is successfully processed, its entry is deleted from ExternalView. - After all partitions handle the transitions correctly, the ExternalView should become empty. - Once the ExternalView is empty, controller deletes the ExternalView. If listResourceInfo is still showing the resource, it could be because of one of the following reasons: 1. The partitions have not yet reached DROPPED state. This should ideally finish in few seconds, depending on what is done as part of OFFLINE->DROPPED transition. 2. One of the partitions went into ERROR state. In this case, resource external view will continue to read. 3. No controller running to delete the external view after all partitions went to OFFLINE/DROPPED state. Vinod's cases is #3. Hang, do you remember if your case was #1 or #2? Thanks, Kishore G On Wed, May 20, 2015 at 1:18 PM, Hang Qi wrote: > No, we have dedicated controllers. > > We first created one resource, and later on we decided to create a new > one, and dropped the previous one. After the drop, listClusterInfo did not > show that resource, but we were able to listResourceInfo by the dropped > one. While in the application, we were still receiving callback/transition > for dropped resource. > > Thanks > Hang Qi > > On Wed, May 20, 2015 at 6:44 AM, Vinoth Chandar wrote: > >> Kishore and I chatted offline. The problem seems to be that there is >> still an external view for the resource, which Kishore tells me exists as >> long as a controller comes back up. (other info: no live instances around) >> >> I am running my app with a distributed/embedded controller, which means >> when I shut down my instances the controller(s) died as well. I will try to >> reproduce this locally and report back. >> >> @Hang, does this have any similarity to your usage? >> >> On Tue, May 19, 2015 at 1:43 PM, Vinoth Chandar wrote: >> >>> I did a ZK dump before I cleared everything out.. Will investigate and >>> send more info out.. >>> >>> @Kishore, dropResource did not error out.. My memory is vague as it was >>> middle of the night :), but I think I shut everything down before I issued >>> the CLI command. >>> >>> Thanks >>> Vinoth >>> >>> On Tue, May 19, 2015 at 12:50 PM, Hang Qi wrote: >>> >>>> Hi Vinoth, >>>> >>>> We met this issue before. What we did is using zk-dumper.sh to dump >>>> everything inside ZK, and see where does this resource exist, and remove >>>> those paths in ZK, and that works. >>>> >>>> Unfortunately, we did not keep the state, so It would be great if you >>>> can share the paths which contains the resource you dropped, that would be >>>> helpful for debugging. >>>> >>>> Thanks >>>> Hang Qi >>>> >>>> On Tue, May 19, 2015 at 11:10 AM, Vinoth Chandar >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I dropped the resource already, but still seeing callbacks firing.. I >>>>> cannot list the resource using listResources. >>>>> >>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181 >>>>> --dropResource streamio countLog >>>>> $:~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181 >>>>> --listResourceInfo streamio countLog | tail -10 >>>>> "simpleFields" : { >>>>> "BUCKET_SIZE" : "0", >>>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>>> "NUM_PARTITIONS" : "4096", >>>>> "REBALANCE_MODE" : "FULL_AUTO", >>>>> "REPLICAS" : "1", >>>>> "STATE_MODEL_DEF_REF" : "OnlineOffline", >>>>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >>>>> } >>>>> } >>>>> $ bin/helix-admin.sh --zkSvr zkmaster:2181 --listResources streamio | >>>>> grep countLog | wc -l >>>>> 0 >>>>> >>>>> Any idea how to troubleshoot this? >>>>> >>>>> Thanks >>>>> Vinoth >>>>> >>>> >>>> >>>> >>>> -- >>>> Qi hang >>>> >>> >>> >> > > > -- > Qi hang > --20cf307f319e2484e30516897352 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Here is what is happening in the co= de.

listClusterInfo gets the resources under /IDEA= LSTATE
listResourceInfo dumps the information for Resource from /= IDEALSTATE/<resourceName> and /EXTERNALVIEW/<resourceName>

This is what happens behind the scene when we drop a r= esource.
  • Idealstate is deleted first
  • Controller f= irsts brings all partitions to their initial state (OFFLINE) and then =C2= =A0fire OFFLINE->DROPPED state. Once the OFFLINE-DROPPED state transitio= n is successfully processed, its entry is deleted from ExternalView.=C2=A0<= /li>
  • After all partitions handle the transitions correctly, the External= View should become empty.=C2=A0
  • Once the ExternalView is empty, con= troller deletes the ExternalView.
If listResourceInfo is stil= l showing the resource, it could be because of one of the following reasons= :
  1. The partitions have not yet reached DROPPED state= . This should ideally finish in few seconds, depending on what is done as p= art of OFFLINE->DROPPED transition.=C2=A0
  2. One of the partitions = went into ERROR state. In this case, resource external view will continue t= o read.
  3. No controller running to delete the external view after all= partitions went to OFFLINE/DROPPED state.=C2=A0
Vinod's = cases is #3. Hang, do you remember if your case was #1 or #2?


Thanks,
Kishore G

=


On Wed, May 20, 2015 at 1:18 PM, Hang Qi <hangq.1985@gmail.co= m> wrote:
= No, we have dedicated controllers.=C2=A0

We first create= d one resource, and later on we decided to create a new one, and dropped th= e previous one. After the drop, listClusterInfo did not show that resource,= but we were able to listResourceInfo by the dropped one. While in the appl= ication, we were still receiving callback/transition for dropped resource.<= /div>

Thanks
Hang Qi=C2=A0

On= Wed, May 20, 2015 at 6:44 AM, Vinoth Chandar <vinoth@uber.com> wrote:
Kishore and I c= hatted offline. The problem seems to be that there is still an external vie= w for the resource, which Kishore tells me exists as long as a controller c= omes back up. (other info: no live instances around)

I a= m running my app with a distributed/embedded controller, which means when I= shut down my instances the controller(s) died as well. I will try to repro= duce this locally and report back.=C2=A0

@Hang, do= es this have any similarity to your usage?=C2=A0

On Tue, May 19, 2015 a= t 1:43 PM, Vinoth Chandar <vinoth@uber.com> wrote:
I did a ZK dump before I cleared ev= erything out.. Will investigate and send more info out..=C2=A0

@Kishore, dropResource did not error out.. My memory is vague as it = was middle of the night :), but I think I shut everything down before I iss= ued the CLI command.=C2=A0

Thanks
Vinoth

On Tue, May 19, 2015 at 12= :50 PM, Hang Qi <hangq.1985@gmail.com> wrote:
Hi Vinoth,

We me= t this issue before. What we did is using zk-dumper.sh to dump everything i= nside ZK, and see where does this resource exist, and remove those paths in= ZK, and that works.

Unfortunately, we did not kee= p the state, so It would be great if you can share the paths which contains= the resource you dropped, that would be helpful for debugging.

Thanks
Hang Qi

On Tue, May 19, 2015 at 11= :10 AM, Vinoth Chandar <vinoth@uber.com> wrote:
Hi,=C2=A0

I dropped the resource already, but still seeing callbacks firing.. I can= not list the resource using listResources.=C2=A0

$= :~/helix-core-0.6.5$ bin/helix-admin.sh --zkSvr zkmaster:2181 --dropResourc= e streamio countLog
$:~/helix-core-0.6.5$ bin/helix-admin.sh --zk= Svr zkmaster:2181 --listResourceInfo streamio countLog | tail -10
=C2=A0 "simpleFields" : {
=C2=A0 =C2=A0 "BUCKET_S= IZE" : "0",
=C2=A0 =C2=A0 "IDEAL_STATE_MODE&q= uot; : "AUTO_REBALANCE",
=C2=A0 =C2=A0 "NUM_PARTIT= IONS" : "4096",
=C2=A0 =C2=A0 "REBALANCE_MODE= " : "FULL_AUTO",
=C2=A0 =C2=A0 "REPLICAS"= ; : "1",
=C2=A0 =C2=A0 "STATE_MODEL_DEF_REF" = : "OnlineOffline",
=C2=A0 =C2=A0 "STATE_MODEL_FACT= ORY_NAME" : "DEFAULT"
=C2=A0 }
}
$ bin/helix-admin.sh --zkSvr zkmaster:2181 --listResources streamio= | grep countLog | wc -l
0

Any ide= a how to troubleshoot this?

Thanks
Vinoth



<= font color=3D"#888888">--
Qi hang





<= /div>--
Qi hang

--20cf307f319e2484e30516897352--