Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3547C200BA3 for ; Thu, 20 Oct 2016 19:28:35 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 33E3F160AE0; Thu, 20 Oct 2016 17:28:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CDAB5160ACC for ; Thu, 20 Oct 2016 19:28:33 +0200 (CEST) Received: (qmail 93738 invoked by uid 500); 20 Oct 2016 17:28:33 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 93728 invoked by uid 99); 20 Oct 2016 17:28:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2016 17:28:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4E93F180148 for ; Thu, 20 Oct 2016 17:28:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=box.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id xaE00sIVVxVf for ; Thu, 20 Oct 2016 17:28:29 +0000 (UTC) Received: from mail-vk0-f50.google.com (mail-vk0-f50.google.com [209.85.213.50]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A95535FD62 for ; Thu, 20 Oct 2016 17:28:24 +0000 (UTC) Received: by mail-vk0-f50.google.com with SMTP id 2so83118489vkb.3 for ; Thu, 20 Oct 2016 10:28:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=box.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=AANAnA+M/ZntybnCCvsB9lWMCQ/dNXEza71SvpyaCUQ=; b=WVhV17Lm3fhe7ReQsnyLjOGTSxmIcux5tS/qws/1Q+sgeepEsNCDIGVQruTMGRhYFV 3bT38Jry4/s5nblouqKde3yfgjPFBrHY5ZAEdu2YKA2oY3EGGTgI2z2q8m/bOH0jwafV a+fO4y+RpihTUfxnhZVAogWjnyz5yUCDvgGYk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=AANAnA+M/ZntybnCCvsB9lWMCQ/dNXEza71SvpyaCUQ=; b=MDr5tn07TntR08HGcMdswBCbn4Ja4AEFctjUS9ha7ZIpZYafgVzZ+x2xo30vqk6SiZ E8f57xrfOzkPTR65AoGDB84jHwqVEEGCrn8EOCXqx/H5aztjhDBaKCjdtdEj0ZXIks77 KWsdd6sPP/b8cwN4LqrxpcyO+qhm7Ga4cc6Pu2xz2hLk2jcROlrlwl5L2XcOBc/N9vt1 RwaiCGD0MDgIuGiYGvAjIecJcSVNFLzQZbI1snfkaXGV64UP3d9KFcuuykH03VEv2iKr MOb90vycRtu68WaiAvQVjPzXsFvRr4I7JNd3vJET+2WRHH1744SxWBrhhbiFLJLZFmAc oglg== X-Gm-Message-State: AA6/9RmW5o+3UPKc8A/OLc422kgdxfEkWntPvbrF8f8p2RHkNGzUq9JJrEimew2CvYn/Ry3LyLSFD8RiGWpy/w8E X-Received: by 10.31.197.2 with SMTP id v2mr2467437vkf.32.1476984504125; Thu, 20 Oct 2016 10:28:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.0.243 with HTTP; Thu, 20 Oct 2016 10:28:23 -0700 (PDT) In-Reply-To: References: From: Michael Craig Date: Thu, 20 Oct 2016 10:28:23 -0700 Message-ID: Subject: Re: Correct way to redistribute work from disconnected instances? To: user@helix.apache.org Content-Type: multipart/alternative; boundary=001a114edc00b5238f053f4f400f archived-at: Thu, 20 Oct 2016 17:28:35 -0000 --001a114edc00b5238f053f4f400f Content-Type: text/plain; charset=UTF-8 That works! The cluster is automatically rebalancing when nodes start/stop. This has raised other questions about rebalancing: Example output below, and I updated the gist: https://gist.github.com/mkscrg/bcb2ab1dd1b3e84ac93e7ca16e2824f8 - When NODE_0 restarts, why is the resource moved back? This seems like unhelpful churn in the cluster. - Why does the resource stay in the OFFLINE state on NODE_0? 2 node cluster with a single resource with 1 partition/replica, using OnlineOffline: Starting ZooKeeper at localhost:2199 Setting up cluster THE_CLUSTER Starting CONTROLLER Starting NODE_0 Starting NODE_1 Adding resource THE_RESOURCE Rebalancing resource THE_RESOURCE Transition: NODE_0 OFFLINE to ONLINE for THE_RESOURCE Cluster state after setup: NODE_0: ONLINE NODE_1: null ------------------------------------------------------------ Stopping NODE_0 Transition: NODE_1 OFFLINE to ONLINE for THE_RESOURCE Cluster state after stopping first node: NODE_0: null NODE_1: ONLINE ------------------------------------------------------------ Starting NODE_0 Transition: NODE_1 ONLINE to OFFLINE for THE_RESOURCE Transition: NODE_1 OFFLINE to DROPPED for THE_RESOURCE Cluster state after restarting first node: NODE_0: OFFLINE NODE_1: null ------------------------------------------------------------ On Thu, Oct 20, 2016 at 9:18 AM, Lei Xia wrote: > Hi, Michael > > To answer your questions: > > - Should you have to `rebalance` a resource when adding a new node to > the cluster? > *--- No, if you are using full-auto rebalance mode, yes if you are in > semi-auto rebalance mode. * > - Should you have to `rebalance` when a node is dropped? *-- Again, > same answer, No, you do not need to in full-auto mode. In full-auto mode, > Helix is supposed to detect nodes add/delete/online/offline and rebalance > the resource automatically. * > > > The problem you saw was because your resource was created in SEMI-AUTO > mode instead of FULL-AUTO mode. HelixAdmin.addResource() creates a > resource in semi-auto mode by default if you do not specify a rebalance > mode explicitly. Please see my comments below on how to fix it. > > > static void addResource() throws Exception { > echo("Adding resource " + RESOURCE_NAME); > ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME, NUM_PARTITIONS, > STATE_MODEL_NAME); *==> ADMIN.addResource(CLUSTER_NAME, RESOURCE_NAME, > NUM_PARTITIONS, STATE_MODEL_NAME, RebalanceMode.FULL_AUTO); * > echo("Rebalancing resource " + RESOURCE_NAME); > ADMIN.rebalance(CLUSTER_NAME, RESOURCE_NAME, NUM_REPLICAS); * // This > just needs to be called once after the resource was created, no need to > call when there is node change. * > } > > > Please give it a try and let me know whether it works. Thanks! > > > Lei > > On Wed, Oct 19, 2016 at 11:52 PM, Michael Craig wrote: > >> Here is some repro code for "drop a node, resource is not redistributed" >> case I described: https://gist.github.com/mkscrg/bcb2ab1dd1b3e84ac9 >> 3e7ca16e2824f8 >> >> Can we answer these 2 questions? That would help clarify things: >> >> - Should you have to `rebalance` a resource when adding a new node to >> the cluster? >> - If no, this is an easy bug to reproduce. The example code >> >> calls rebalance after adding a node, and it breaks if you comment out that >> line. >> - If yes, what is the correct way to manage many resources on a >> cluster? Iterate through all resources and rebalance them for every new >> node? >> - Should you have to `rebalance` when a node is dropped? >> - If no, there is a bug. See the repro code posted above. >> - If yes, we are in the same rebalance-every-resource situation as >> above. >> >> My use case is to manage a set of ad-hoc tasks across a cluster of >> machines. Each task would be a separate resource with a unique name, with 1 >> partition and 1 replica. Each resource would reside on exactly 1 node, and >> there is no limit on the number of resources per node. >> >> On Wed, Oct 19, 2016 at 9:23 PM, Lei Xia wrote: >> >>> Hi, Michael >>> >>> Could you be more specific on the issue you see? Specifically: >>> 1) For 1 resource and 2 replicas, you mean the resource has only 1 >>> partition, with replica number equals to 2, right? >>> 2) You see* REBALANCE_MODE="FULL_AUTO"*, not* IDEALSTATE_MODE="AUTO" *in >>> your idealState, right? >>> 3) by dropping N1, you mean disconnect N1 from helix/zookeeper, so N1 >>> is not in liveInstances, right? >>> >>> If your answers to all of above questions are yes, then there may be >>> some bug here. If possible, please paste your idealstate, and your test >>> code (if there is any) here, I will try to reproduce and debug it. Thanks >>> >>> >>> Lei >>> >>> On Wed, Oct 19, 2016 at 9:02 PM, kishore g wrote: >>> >>>> Can you describe your scenario in detail and the expected behavior?. I >>>> agree calling rebalance on every live instance change is ugly and >>>> definitely not as per the design. It was an oversight (we focussed a lot of >>>> large number of partitions and failed to handle this simple case). >>>> >>>> Please file and jira and we will work on that. Lei, do you think the >>>> recent bug we fixed with AutoRebalancer will handle this case? >>>> >>>> thanks, >>>> Kishore G >>>> >>>> On Wed, Oct 19, 2016 at 8:55 PM, Michael Craig wrote: >>>> >>>>> Thanks for the quick response Kishore. This issue is definitely tied >>>>> to the condition that partitions * replicas < NODE_COUNT. >>>>> If all running nodes have a "piece" of the resource, then they behave >>>>> well when the LEADER node goes away. >>>>> >>>>> Is it possible to use Helix to manage a set of resources where that >>>>> condition is true? I.e. where the *total *number of >>>>> partitions/replicas in the cluster is greater than the node count, but each >>>>> individual resource has a small number of partitions/replicas. >>>>> >>>>> (Calling rebalance on every liveInstance change does not seem like a >>>>> good solution, because you would have to iterate through all resources in >>>>> the cluster and rebalance each individually.) >>>>> >>>>> >>>>> On Wed, Oct 19, 2016 at 12:52 PM, kishore g >>>>> wrote: >>>>> >>>>>> I think this might be a corner case when partitions * replicas < >>>>>> TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and >>>>>> check if the issue still exists. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig >>>>>> wrote: >>>>>> >>>>>>> I've noticed that partitions/replicas assigned to disconnected >>>>>>> instances are not automatically redistributed to live instances. What's the >>>>>>> correct way to do this? >>>>>>> >>>>>>> For example, given this setup with Helix 0.6.5: >>>>>>> - 1 resource >>>>>>> - 2 replicas >>>>>>> - LeaderStandby state model >>>>>>> - FULL_AUTO rebalance mode >>>>>>> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting) >>>>>>> >>>>>>> Then drop N1: >>>>>>> - N2 becomes LEADER >>>>>>> - Nothing happens to N3 >>>>>>> >>>>>>> Naively, I would have expected N3 to transition from Offline to >>>>>>> Standby, but that doesn't happen. >>>>>>> >>>>>>> I can force redistribution from GenericHelixController#onLiveInstanceChange >>>>>>> by >>>>>>> - dropping non-live instances from the cluster >>>>>>> - calling rebalance >>>>>>> >>>>>>> The instance dropping seems pretty unsafe! Is there a better way? >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Lei Xia >>> >> >> > > > -- > > *Lei Xia *Senior Software Engineer > Data Infra/Nuage & Helix > LinkedIn > > lxia@linkedin.com > www.linkedin.com/in/lxia1 > --001a114edc00b5238f053f4f400f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
That works! The cluster is automatically rebalancing when = nodes start/stop. This has raised other questions about rebalancing:
Example output below, and I updated the gist:=C2=A0https://gi= st.github.com/mkscrg/bcb2ab1dd1b3e84ac93e7ca16e2824f8
    When NODE_0 restarts, why is the resource moved back? This seems like unhe= lpful churn in the cluster.
  • Why does the resource stay in the OFFLI= NE state on NODE_0?

2 node cluster with = a single resource with 1 partition/replica, using OnlineOffline:
=
Starting ZooKeeper at localhost:2199
Setting = up cluster THE_CLUSTER
Starting CONTROLLER
Starting= NODE_0
Starting NODE_1
Adding resource THE_RESOURCE
Rebalancing resource THE_RESOURCE
Transition: NODE_0 OFFL= INE to ONLINE for THE_RESOURCE
Cluster state after setup:
N= ODE_0: ONLINE
NODE_1: null
--------------------------------= ----------------------------
Stopping NODE_0
Transition= : NODE_1 OFFLINE to ONLINE for THE_RESOURCE
Cluster state after s= topping first node:
NODE_0: null
NODE_1: ONLINE
-----= -------------------------------------------------------
Starting = NODE_0
Transition: NODE_1 ONLINE to OFFLINE for THE_RESOURCE
Transition: NODE_1 OFFLINE to DROPPED for THE_RESOURCE
Clus= ter state after restarting first node:
NODE_0: OFFLINE
NODE_1: = null
------------------------------------------------------------=

On Thu, Oct 20, 2016 at 9:18 AM, Lei Xia <lxia@linkedin.com> wrote:
Hi, Michae= l

=C2=A0 To answer your questions:
  • Shoul= d you have to `rebalance` a resource when adding a new node to the cluster?= =C2=A0=C2=A0 --- No, if you are using full-auto rebalance mode,=C2=A0= yes if you are in semi-auto rebalance mode.
  • Should yo= u have to `rebalance` when a node is dropped? -- Again, same answer, = No, you do not need to in full-auto mode.=C2=A0 In full-auto mode, Helix is= supposed to detect nodes add/delete/online/offline and rebalance the resou= rce automatically.

=C2=A0 The problem you sa= w was because your resource was created in SEMI-AUTO mode instead of FULL-A= UTO mode.=C2=A0 HelixAdmin.addResource() creates a resource in semi-auto mo= de by default if you do not specify a rebalance mode explicitly.=C2=A0 Plea= se see my comments below on how to fix it.


static void addResource() throws E= xception { =20 =20 =20
=C2=A0 echo("Adding re= source " + RESOURCE_NAME); =20 =20 =20
=C2=A0 AD= MIN
.addResou= rce(CLUSTER_NAME, RESOURCE_NAME= , NUM_PARTITIONS, = STATE_MODEL_NAME);= =C2=A0 =20 =20 =20 =3D=3D> ADMIN.ad= dResource(CLUSTER_NAME, RESOURCE_NAME<= /span>, NUM_PARTITIONS, STATE_MODEL_NAME, RebalanceMode.FULL_AUTO); =20 =20 =20
=C2=A0 echo("R= ebalancing resource &quo= t; + = RESOURCE_NAME); =20 =20 =20
=C2=A0 AD= MIN
.rebalanc= e(CLUSTER_NAME, RESOURCE_NAME
, NUM_REPLICAS); =C2=A0 // This just needs to be called once after the resource was cr= eated, no need to call when there is node change. =20 =20 =20
}


Please give it a try and let = me know whether it works.=C2=A0 Thanks!


Lei
=

On Wed, Oct 19, 2016 at 11:52 PM, Michael Craig <mcra= ig@box.com> wrote:
Here is some repro code for "drop a node, resource is no= t redistributed" case I described:=C2=A0https://gis= t.github.com/mkscrg/bcb2ab1dd1b3e84ac93e7ca16e2824f8

Can we answer these 2 questions? That would help clarify t= hings:
  • Should you have to `rebalance` a resource when add= ing a new node to the cluster?
    • If no, this is an easy bug t= o reproduce. The example code calls rebalance after adding a node, and= it breaks if you comment out that line.
    • If yes, what is the correc= t way to manage many resources on a cluster? Iterate through all resources = and rebalance them for every new node?
  • Should you have to `reb= alance` when a node is dropped?
    • If no, there is a bug. See the = repro code posted above.
    • If yes, we are in the same rebalance-every= -resource situation as above.
My use case is to manage a= set of ad-hoc tasks across a cluster of machines. Each task would be a sep= arate resource with a unique name, with 1 partition and 1 replica. Each res= ource would reside on exactly 1 node, and there is no limit on the number o= f resources per node.
=
On Wed, Oct 19, 2016 at 9:23 PM, Lei Xia <x= iaxlei@gmail.com> wrote:
Hi, Michael

=C2=A0 Could you be more specif= ic on the issue you see? Specifically:
=C2=A0 1) F
or 1 resource a= nd 2 replicas, you mean the resource has only 1 partition, with replica num= ber equals to 2, right?
=C2=A0 2) You see REBALANC= E_MODE=3D"FULL_AUTO", not IDEALSTATE_MODE=3D"A= UTO" in your idealState, right?
=C2=A0 3) by dropping N1, you = mean disconnect N1 from helix/zookeeper, so N1 is not in liveInstances, ri= ght?

=C2=A0 If your answers to all of above questions are yes,= then there may be some bug here.=C2=A0 If possible, please paste your idea= lstate, and your test code (if there is any) here, I will try to reproduce = and debug it.=C2=A0 Thanks


Lei

On Wed, Oct 19, 2016 at 9:02 PM, kishore g = <g.kishore@gmail.com> wrote:
Can you describe your scenario in detail and the expe= cted behavior?. I agree calling rebalance on every live instance change is = ugly and definitely not as per the design. It was an oversight (we focussed= a lot of large number of partitions and failed to handle this simple case)= .

Please file and jira and we will work on that. Lei, do= you think the recent bug we fixed with AutoRebalancer will handle this cas= e?

thanks,
Kishore G

On = Wed, Oct 19, 2016 at 8:55 PM, Michael Craig <mcraig@box.com> = wrote:
Thanks for t= he quick response Kishore. This issue is definitely tied to the condition t= hat partitions * replicas < NODE_COUNT.=C2=A0
If all running n= odes have a "piece" of the resource, then they behave well when t= he LEADER node goes away.

Is it possible to us= e Helix to manage a set of resources where that condition is true? I.e. whe= re the total number of partitions/replicas in the cluster is greater= than the node count, but each individual resource has a small number of pa= rtitions/replicas.

(Calling rebalance on every liv= eInstance change does not seem like a good solution, because you would have= to iterate through all resources in the cluster and rebalance each individ= ually.)


On Wed, Oct 19, 2016 at 12:52 PM, kishore g <g.kishore@gmail.com> wrote:
I t= hink this might be a corner case when partitions * replicas < TOTAL_NUMB= ER_OF_NODES. Can you try with many partitions and replicas and check if the= issue still exists.



On = Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mcraig@box.com> = wrote:
I've not= iced that partitions/replicas assigned to disconnected instances are not au= tomatically redistributed to live instances. What's the correct way to = do this?

For example, given this setup with Helix 0.6.5= :
- 1 resource
- 2 replicas
- LeaderStandby sta= te model
- FULL_AUTO rebalance mode
- 3 nodes (N1 is Le= ader, N2 is Standby, N3 is just sitting)

Then drop= N1:
- N2 becomes LEADER
- Nothing happens to N3
<= div>
Naively, I would have expected N3 to transition from Off= line to Standby, but that doesn't happen.

I ca= n force redistribution from GenericHelixController#onLiveInstanceChang= e by
- dropping non-live instances from the cluster
- c= alling rebalance

The instance dropping seems prett= y unsafe! Is there a better way?






--
L= ei Xia




--
Lei Xia
Senior Software Engineer

Data Infra/Nuage & Helix
LinkedIn

lxia@linkedin.com
www.linkedin.com/in/lxia1

--001a114edc00b5238f053f4f400f--