Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@helix.incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of g.kishore@gmail.com
 designates 74.125.82.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+sC4q5R1WtC+A_7=HQrRxOg6Xhm=ftzByyx2yN8dgo3O3=vAw@mail.gmail.com>
References: 
 <CA+sC4q5R1WtC+A_7=HQrRxOg6Xhm=ftzByyx2yN8dgo3O3=vAw@mail.gmail.com>
Date: Mon, 25 Feb 2013 18:31:51 -0800
Message-ID: 
 <CABaj-Qb-hP=rdrFGAhvDcG2oFYy6DAVO5ic=q667gjqnSSGw5w@mail.gmail.com>
Subject: Re: Questions about Helix
From: kishore g <g.kishore@gmail.com>
To: dev <dev@helix.incubator.apache.org>, user@helix.incubator.apache.org
Content-Type: multipart/alternative; boundary=047d7b5d456cda685c04d697754d

--047d7b5d456cda685c04d697754d
Content-Type: text/plain; charset=ISO-8859-1

Thanks Abhishek. Glad you are enjoying playing with Helix. Apologize for
the insufficient documentation, we have additional documentation that need
some clean up like converting to markdown format and removing linked in
specific stuff. Will be great if you can help us here.

The reason we read everything from zookeeper is to have a consistent
snapshot of the system state. We have lot of optimization to read only
changed data and use zk async apis.

In general it is not a good idea to keep  any state in memory state in the
controller, it makes it very difficult to reason about issues and also
provide fault tolerant system.

You can add your code in the best possible calc stage, and depend on the
data in cluster data cache. Do not worry about the existing messages or
current state. We have other stages downstream message selection that makes
sure constraints are not violated.

Basically the trick is to make the idealstate calc code idempotent. That is
given a state machine, constraints,objectives and set of live nodes come up
with the same idealstate. If you can model your algo this way, you will be
good.

Your understanding about distributed controllers is right. I will provide
more details on the website on running it in distributed mode. But you
probably dont need this since you have only one cluster. You can simple
start multiple controllers and we ensure that only one will be active.

We do have a mechanism to test, simulate failure and analyze logs.
Unfortunately it uses some linkedin internal tool to validate the logs for
any constraint violation. I will create a jira and post the idea and
implementation we have. You can help us take it to the next level.

If you get your algo to be idempotent then customcodeinvoker might work for
you.

Thanks again for brilliant questions.

 On Feb 25, 2013 12:04 PM, "Abhishek Rai" <abhishekrai@gmail.com> wrote:

> Hi Helix experts,
>
> For the past few weeks I've been playing with Helix and would like to share
> some experiences, and ask some questions.
>
> First of all, thanks to the Helix team for creating and open sourcing such
> an awesome tool!  I like the abstractions used by Helix and found the SOCC
> paper very helpful.
>
> My use case currently is for managing a cluster containing a single DDS.
> In the future, we will need to manage about 5-6 different DDS' within the
> same cluster of machines.  The DDS I'm managing needs customized
> rebalancing.  I've setup a participant on each machine in the cluster, and
> a centralized controller manages the cluster.
>
> I am not sure what is the best way to integrate my rebalancing code with
> Helix controller code.  Kishore previously suggested adding a new stage to
> the controller's pipeline.  An alternative that I've implemented is to
> subclass from GenericHelixController, and in each listener callback, run my
> rebalancing code and write out ideal states using ZKHelixAdmin.  The
> callbacks maintain an in-memory model of cluster state and do not read it
> from the Zookeeper as part of the custom rebalancing functionality.  In
> contrast, the pipeline stages used by GenericHelixController seem to read
> the data directly from Zookeeper every time.  The pipeline stages are also
> aware of ongoing transitions, which my rebalancer code is not aware of.
> What is the recommended approach for adding custom rebalancing code?
>
> For high availability, I run 3 controllers on different nodes with custom
> leader-election between them.  When a controller starts, it waits to grab a
> Zookeeper lock, and then connects as a Helix controller.  Controller which
> loses its lock dies and is restarted automatically by the shell.  I tried
> using the "distributed controller" feature in Helix but couldn't.  I kept
> seeing in the controller logs "initial cluster setup is not done...".  I
> tried a few things based on reading the Helix paper and docs (e.g. setting
> up another cluster and adding each controller as a participant to that
> cluster) but couldn't figure out how to make it work.  I realize that I
> don't understand how the distributed controller feature works.  Is the idea
> that each controller is a participant in another Helix cluster, and
> receives controller-ship of a DDS cluster as a "resource assignment"?  In
> that case, is a "super" controller needed for this "super" cluster?  If so,
> then how does one ensure HA of the super cluster?
>
> I've been stress testing the system in production by repeatedly restarting
> controller and participant nodes.  All this while ensuring that Zookeeper
> stays up.  I have run into some problems.  Kishore helped triage one of
> them last week (https://issues.apache.org/jira/browse/HELIX-53).  This
> problem was manifesting itself as messages of the following form in
> participant and controller logs:
> ERROR org.apache.helix.controller.stages.MessageGenerationPhase: Unable to
> find a next state for partition XYZ  from:SERVING to:OFFLINE
> and also
> ERROR ...  Force CurrentState on Zk to be stateModel's CurrentState.
> I'm still getting some of these messages but I can tell that the system is
> working fine overall now.
>
> What are the semantics of the persistent message queue between the
> controller and the participant.  If the controller restarts or fails over
> while there are outstanding messages for existing participants, does the
> new controller honor the transitions implied by any outstanding messages?
> How does the participant acknowledge that it has executed the transition
> specified in a message?  Does it do so by writing a new current state to
> Zookeeper, or by deleting the old message?
>
> Also, is there any testing framework distributed with Helix for integration
> testing of a customized Helix controller and participants.  For example,
> something that would take care of scaffolding of the cluster, provide hooks
> for simulating operational problems such as participant failures.
>
> Thanks for your help!
> Abhishek
>

--047d7b5d456cda685c04d697754d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>Thanks Abhishek. Glad you are enjoying playing with Helix. Apologize for=
 the insufficient documentation, we have additional documentation that need=
 some clean up like converting to markdown format and removing linked in sp=
ecific stuff. Will be great if you can help us here.</p>

<p>The reason we read everything from zookeeper is to have a consistent sna=
pshot of the system state. We have lot of optimization to read only changed=
 data and use zk async apis.</p>
<p>In general it is not a good idea to keep=A0 any state in memory state in=
 the controller, it makes it very difficult to reason about issues and also=
 provide fault tolerant system.</p>
<p>You can add your code in the best possible calc stage, and depend on the=
 data in cluster data cache. Do not worry about the existing messages or cu=
rrent state. We have other stages downstream message selection that makes s=
ure constraints are not violated.</p>

<p>Basically the trick is to make the idealstate calc code idempotent. That=
 is given a state machine, constraints,objectives and set of live nodes com=
e up with the same idealstate. If you can model your algo this way, you wil=
l be good.</p>

<p>Your understanding about distributed controllers is right. I will provid=
e more details on the website on running it in distributed mode. But you pr=
obably dont need this since you have only one cluster. You can simple start=
 multiple controllers and we ensure that only one will be active.</p>

<p>We do have a mechanism to test, simulate failure and analyze logs. Unfor=
tunately it uses some linkedin internal tool to validate the logs for any c=
onstraint violation. I will create a jira and post the idea and implementat=
ion we have. You can help us take it to the next level.</p>

<p>If you get your algo to be idempotent then customcodeinvoker might work =
for you.</p>
<p>Thanks again for brilliant questions.<br></p>
<p> </p>
<div class=3D"gmail_quote">On Feb 25, 2013 12:04 PM, &quot;Abhishek Rai&quo=
t; &lt;<a href=3D"mailto:abhishekrai@gmail.com">abhishekrai@gmail.com</a>&g=
t; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Helix experts,<br>
<br>
For the past few weeks I&#39;ve been playing with Helix and would like to s=
hare<br>
some experiences, and ask some questions.<br>
<br>
First of all, thanks to the Helix team for creating and open sourcing such<=
br>
an awesome tool! =A0I like the abstractions used by Helix and found the SOC=
C<br>
paper very helpful.<br>
<br>
My use case currently is for managing a cluster containing a single DDS.<br=
>
In the future, we will need to manage about 5-6 different DDS&#39; within t=
he<br>
same cluster of machines. =A0The DDS I&#39;m managing needs customized<br>
rebalancing. =A0I&#39;ve setup a participant on each machine in the cluster=
, and<br>
a centralized controller manages the cluster.<br>
<br>
I am not sure what is the best way to integrate my rebalancing code with<br=
>
Helix controller code. =A0Kishore previously suggested adding a new stage t=
o<br>
the controller&#39;s pipeline. =A0An alternative that I&#39;ve implemented =
is to<br>
subclass from GenericHelixController, and in each listener callback, run my=
<br>
rebalancing code and write out ideal states using ZKHelixAdmin. =A0The<br>
callbacks maintain an in-memory model of cluster state and do not read it<b=
r>
from the Zookeeper as part of the custom rebalancing functionality. =A0In<b=
r>
contrast, the pipeline stages used by GenericHelixController seem to read<b=
r>
the data directly from Zookeeper every time. =A0The pipeline stages are als=
o<br>
aware of ongoing transitions, which my rebalancer code is not aware of.<br>
What is the recommended approach for adding custom rebalancing code?<br>
<br>
For high availability, I run 3 controllers on different nodes with custom<b=
r>
leader-election between them. =A0When a controller starts, it waits to grab=
 a<br>
Zookeeper lock, and then connects as a Helix controller. =A0Controller whic=
h<br>
loses its lock dies and is restarted automatically by the shell. =A0I tried=
<br>
using the &quot;distributed controller&quot; feature in Helix but couldn=
9;t. =A0I kept<br>
seeing in the controller logs &quot;initial cluster setup is not done...&qu=
ot;. =A0I<br>
tried a few things based on reading the Helix paper and docs (e.g. setting<=
br>
up another cluster and adding each controller as a participant to that<br>
cluster) but couldn&#39;t figure out how to make it work. =A0I realize that=
 I<br>
don&#39;t understand how the distributed controller feature works. =A0Is th=
e idea<br>
that each controller is a participant in another Helix cluster, and<br>
receives controller-ship of a DDS cluster as a &quot;resource assignment&qu=
ot;? =A0In<br>
that case, is a &quot;super&quot; controller needed for this &quot;super&qu=
ot; cluster? =A0If so,<br>
then how does one ensure HA of the super cluster?<br>
<br>
I&#39;ve been stress testing the system in production by repeatedly restart=
ing<br>
controller and participant nodes. =A0All this while ensuring that Zookeeper=
<br>
stays up. =A0I have run into some problems. =A0Kishore helped triage one of=
<br>
them last week (<a href=3D"https://issues.apache.org/jira/browse/HELIX-53" =
target=3D"_blank">https://issues.apache.org/jira/browse/HELIX-53</a>). =A0T=
his<br>
problem was manifesting itself as messages of the following form in<br>
participant and controller logs:<br>
ERROR org.apache.helix.controller.stages.MessageGenerationPhase: Unable to<=
br>
find a next state for partition XYZ =A0from:SERVING to:OFFLINE<br>
and also<br>
ERROR ... =A0Force CurrentState on Zk to be stateModel&#39;s CurrentState.<=
br>
I&#39;m still getting some of these messages but I can tell that the system=
 is<br>
working fine overall now.<br>
<br>
What are the semantics of the persistent message queue between the<br>
controller and the participant. =A0If the controller restarts or fails over=
<br>
while there are outstanding messages for existing participants, does the<br=
>
new controller honor the transitions implied by any outstanding messages?<b=
r>
How does the participant acknowledge that it has executed the transition<br=
>
specified in a message? =A0Does it do so by writing a new current state to<=
br>
Zookeeper, or by deleting the old message?<br>
<br>
Also, is there any testing framework distributed with Helix for integration=
<br>
testing of a customized Helix controller and participants. =A0For example,<=
br>
something that would take care of scaffolding of the cluster, provide hooks=
<br>
for simulating operational problems such as participant failures.<br>
<br>
Thanks for your help!<br>
Abhishek<br>
</blockquote></div>

--047d7b5d456cda685c04d697754d--