Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 27CD0E57C for ; Tue, 26 Feb 2013 02:32:21 +0000 (UTC) Received: (qmail 93274 invoked by uid 500); 26 Feb 2013 02:32:20 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 93194 invoked by uid 500); 26 Feb 2013 02:32:20 -0000 Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.incubator.apache.org Delivered-To: mailing list user@helix.incubator.apache.org Received: (qmail 93178 invoked by uid 99); 26 Feb 2013 02:32:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 02:32:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of g.kishore@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-wg0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 02:32:14 +0000 Received: by mail-wg0-f51.google.com with SMTP id 8so2919987wgl.18 for ; Mon, 25 Feb 2013 18:31:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=hGYa48PvkXjl8sjg5z7Z2G5pC6AkJvwOIJEZvZrW/t8=; b=tWVflCdrv8/K9Z348H2lh3z2iJxrxBwBv0if+JIwwH5K36MbuCpKN7TcLsdnfbMVQM G2cFUMs5kzplWh3q56HqgKhzpudPPtKnNPN2PiMMXiRsmLkoHRWdPT3LdetwbXf0iq3i xkkatLodIycJXuQDWeF1O3mzsAq6rMEyGb4acd5r2aEqAGYhTeBrq2Q3ib32nHE7V6P7 +MARbd9W79U2Hw7ilZ2CzMkFO8/yRrN8+LR5Kl4by7AZMBh73hRCfL/Dr+K/ZtIVis8H 8lsS8Yeu5kUgxxEic7jH88qx78s/L5cbzcdKDbA+t6Xp+6CaNB1H3W/hqP2WS6Z9upeX NA9Q== MIME-Version: 1.0 X-Received: by 10.194.11.70 with SMTP id o6mr12696533wjb.39.1361845912493; Mon, 25 Feb 2013 18:31:52 -0800 (PST) Received: by 10.194.0.40 with HTTP; Mon, 25 Feb 2013 18:31:51 -0800 (PST) Received: by 10.194.0.40 with HTTP; Mon, 25 Feb 2013 18:31:51 -0800 (PST) In-Reply-To: References: Date: Mon, 25 Feb 2013 18:31:51 -0800 Message-ID: Subject: Re: Questions about Helix From: kishore g To: dev , user@helix.incubator.apache.org Content-Type: multipart/alternative; boundary=047d7b5d456cda685c04d697754d X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5d456cda685c04d697754d Content-Type: text/plain; charset=ISO-8859-1 Thanks Abhishek. Glad you are enjoying playing with Helix. Apologize for the insufficient documentation, we have additional documentation that need some clean up like converting to markdown format and removing linked in specific stuff. Will be great if you can help us here. The reason we read everything from zookeeper is to have a consistent snapshot of the system state. We have lot of optimization to read only changed data and use zk async apis. In general it is not a good idea to keep any state in memory state in the controller, it makes it very difficult to reason about issues and also provide fault tolerant system. You can add your code in the best possible calc stage, and depend on the data in cluster data cache. Do not worry about the existing messages or current state. We have other stages downstream message selection that makes sure constraints are not violated. Basically the trick is to make the idealstate calc code idempotent. That is given a state machine, constraints,objectives and set of live nodes come up with the same idealstate. If you can model your algo this way, you will be good. Your understanding about distributed controllers is right. I will provide more details on the website on running it in distributed mode. But you probably dont need this since you have only one cluster. You can simple start multiple controllers and we ensure that only one will be active. We do have a mechanism to test, simulate failure and analyze logs. Unfortunately it uses some linkedin internal tool to validate the logs for any constraint violation. I will create a jira and post the idea and implementation we have. You can help us take it to the next level. If you get your algo to be idempotent then customcodeinvoker might work for you. Thanks again for brilliant questions. On Feb 25, 2013 12:04 PM, "Abhishek Rai" wrote: > Hi Helix experts, > > For the past few weeks I've been playing with Helix and would like to share > some experiences, and ask some questions. > > First of all, thanks to the Helix team for creating and open sourcing such > an awesome tool! I like the abstractions used by Helix and found the SOCC > paper very helpful. > > My use case currently is for managing a cluster containing a single DDS. > In the future, we will need to manage about 5-6 different DDS' within the > same cluster of machines. The DDS I'm managing needs customized > rebalancing. I've setup a participant on each machine in the cluster, and > a centralized controller manages the cluster. > > I am not sure what is the best way to integrate my rebalancing code with > Helix controller code. Kishore previously suggested adding a new stage to > the controller's pipeline. An alternative that I've implemented is to > subclass from GenericHelixController, and in each listener callback, run my > rebalancing code and write out ideal states using ZKHelixAdmin. The > callbacks maintain an in-memory model of cluster state and do not read it > from the Zookeeper as part of the custom rebalancing functionality. In > contrast, the pipeline stages used by GenericHelixController seem to read > the data directly from Zookeeper every time. The pipeline stages are also > aware of ongoing transitions, which my rebalancer code is not aware of. > What is the recommended approach for adding custom rebalancing code? > > For high availability, I run 3 controllers on different nodes with custom > leader-election between them. When a controller starts, it waits to grab a > Zookeeper lock, and then connects as a Helix controller. Controller which > loses its lock dies and is restarted automatically by the shell. I tried > using the "distributed controller" feature in Helix but couldn't. I kept > seeing in the controller logs "initial cluster setup is not done...". I > tried a few things based on reading the Helix paper and docs (e.g. setting > up another cluster and adding each controller as a participant to that > cluster) but couldn't figure out how to make it work. I realize that I > don't understand how the distributed controller feature works. Is the idea > that each controller is a participant in another Helix cluster, and > receives controller-ship of a DDS cluster as a "resource assignment"? In > that case, is a "super" controller needed for this "super" cluster? If so, > then how does one ensure HA of the super cluster? > > I've been stress testing the system in production by repeatedly restarting > controller and participant nodes. All this while ensuring that Zookeeper > stays up. I have run into some problems. Kishore helped triage one of > them last week (https://issues.apache.org/jira/browse/HELIX-53). This > problem was manifesting itself as messages of the following form in > participant and controller logs: > ERROR org.apache.helix.controller.stages.MessageGenerationPhase: Unable to > find a next state for partition XYZ from:SERVING to:OFFLINE > and also > ERROR ... Force CurrentState on Zk to be stateModel's CurrentState. > I'm still getting some of these messages but I can tell that the system is > working fine overall now. > > What are the semantics of the persistent message queue between the > controller and the participant. If the controller restarts or fails over > while there are outstanding messages for existing participants, does the > new controller honor the transitions implied by any outstanding messages? > How does the participant acknowledge that it has executed the transition > specified in a message? Does it do so by writing a new current state to > Zookeeper, or by deleting the old message? > > Also, is there any testing framework distributed with Helix for integration > testing of a customized Helix controller and participants. For example, > something that would take care of scaffolding of the cluster, provide hooks > for simulating operational problems such as participant failures. > > Thanks for your help! > Abhishek > --047d7b5d456cda685c04d697754d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Thanks Abhishek. Glad you are enjoying playing with Helix. Apologize for= the insufficient documentation, we have additional documentation that need= some clean up like converting to markdown format and removing linked in sp= ecific stuff. Will be great if you can help us here.

The reason we read everything from zookeeper is to have a consistent sna= pshot of the system state. We have lot of optimization to read only changed= data and use zk async apis.

In general it is not a good idea to keep=A0 any state in memory state in= the controller, it makes it very difficult to reason about issues and also= provide fault tolerant system.

You can add your code in the best possible calc stage, and depend on the= data in cluster data cache. Do not worry about the existing messages or cu= rrent state. We have other stages downstream message selection that makes s= ure constraints are not violated.

Basically the trick is to make the idealstate calc code idempotent. That= is given a state machine, constraints,objectives and set of live nodes com= e up with the same idealstate. If you can model your algo this way, you wil= l be good.

Your understanding about distributed controllers is right. I will provid= e more details on the website on running it in distributed mode. But you pr= obably dont need this since you have only one cluster. You can simple start= multiple controllers and we ensure that only one will be active.

We do have a mechanism to test, simulate failure and analyze logs. Unfor= tunately it uses some linkedin internal tool to validate the logs for any c= onstraint violation. I will create a jira and post the idea and implementat= ion we have. You can help us take it to the next level.

If you get your algo to be idempotent then customcodeinvoker might work = for you.

Thanks again for brilliant questions.

On Feb 25, 2013 12:04 PM, "Abhishek Rai&quo= t; <abhishekrai@gmail.com&g= t; wrote:
Hi Helix experts,

For the past few weeks I've been playing with Helix and would like to s= hare
some experiences, and ask some questions.

First of all, thanks to the Helix team for creating and open sourcing such<= br> an awesome tool! =A0I like the abstractions used by Helix and found the SOC= C
paper very helpful.

My use case currently is for managing a cluster containing a single DDS. In the future, we will need to manage about 5-6 different DDS' within t= he
same cluster of machines. =A0The DDS I'm managing needs customized
rebalancing. =A0I've setup a participant on each machine in the cluster= , and
a centralized controller manages the cluster.

I am not sure what is the best way to integrate my rebalancing code with Helix controller code. =A0Kishore previously suggested adding a new stage t= o
the controller's pipeline. =A0An alternative that I've implemented = is to
subclass from GenericHelixController, and in each listener callback, run my=
rebalancing code and write out ideal states using ZKHelixAdmin. =A0The
callbacks maintain an in-memory model of cluster state and do not read it from the Zookeeper as part of the custom rebalancing functionality. =A0In contrast, the pipeline stages used by GenericHelixController seem to read the data directly from Zookeeper every time. =A0The pipeline stages are als= o
aware of ongoing transitions, which my rebalancer code is not aware of.
What is the recommended approach for adding custom rebalancing code?

For high availability, I run 3 controllers on different nodes with custom leader-election between them. =A0When a controller starts, it waits to grab= a
Zookeeper lock, and then connects as a Helix controller. =A0Controller whic= h
loses its lock dies and is restarted automatically by the shell. =A0I tried=
using the "distributed controller" feature in Helix but couldn= 9;t. =A0I kept
seeing in the controller logs "initial cluster setup is not done...&qu= ot;. =A0I
tried a few things based on reading the Helix paper and docs (e.g. setting<= br> up another cluster and adding each controller as a participant to that
cluster) but couldn't figure out how to make it work. =A0I realize that= I
don't understand how the distributed controller feature works. =A0Is th= e idea
that each controller is a participant in another Helix cluster, and
receives controller-ship of a DDS cluster as a "resource assignment&qu= ot;? =A0In
that case, is a "super" controller needed for this "super&qu= ot; cluster? =A0If so,
then how does one ensure HA of the super cluster?

I've been stress testing the system in production by repeatedly restart= ing
controller and participant nodes. =A0All this while ensuring that Zookeeper=
stays up. =A0I have run into some problems. =A0Kishore helped triage one of=
them last week (https://issues.apache.org/jira/browse/HELIX-53). =A0T= his
problem was manifesting itself as messages of the following form in
participant and controller logs:
ERROR org.apache.helix.controller.stages.MessageGenerationPhase: Unable to<= br> find a next state for partition XYZ =A0from:SERVING to:OFFLINE
and also
ERROR ... =A0Force CurrentState on Zk to be stateModel's CurrentState.<= br> I'm still getting some of these messages but I can tell that the system= is
working fine overall now.

What are the semantics of the persistent message queue between the
controller and the participant. =A0If the controller restarts or fails over=
while there are outstanding messages for existing participants, does the new controller honor the transitions implied by any outstanding messages? How does the participant acknowledge that it has executed the transition specified in a message? =A0Does it do so by writing a new current state to<= br> Zookeeper, or by deleting the old message?

Also, is there any testing framework distributed with Helix for integration=
testing of a customized Helix controller and participants. =A0For example,<= br> something that would take care of scaffolding of the cluster, provide hooks=
for simulating operational problems such as participant failures.

Thanks for your help!
Abhishek
--047d7b5d456cda685c04d697754d--