helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Co Ting Keh <la...@box.com>
Subject Re: General Architecture built around Helix
Date Fri, 14 Jun 2013 21:30:25 GMT
We have a working prototype of basically something like #2 you proposed
above. We're using the standard helix participant, and on the @Transitions
of the state model send commands to node.js via Http.

I want to run you through our general architecture to make sure we are not
violating anything on the Helix side. As a reminder, what we need to
guarantee is that an any given time one and only one node.js process is in
charge of a task.

1. A machine with N cores will have N (pending testing) node.js processes
2. Associated with each of the N node processes are also N Helix
participants (separate JVM instances -- reason for this to come later)
3. Separate helix controller will be running on the machine and will just
leader elect between machines.
4. The spectator router will likely be HAProxy and thus a linux kernel will
run JVM to serve as Helix spectator
5. The state machine for each will simply be ONLINEOFFLINE mode. (however i
do get error messages that say that i havent defined an OFFLINE to DROPPED
mode, i was going to ask you this but this is a minor detail compared to
the rest of the architecture)
5. Simple Bash script will serve as a watch dog on each node.js and helix
participant pair. If any of the two are "dead" the other process must
immediately be SIGKILLED, hence the need for one JVM serving as Helix
Participant for every Node.js
6. Each node.js instance sets a watch on /LIVEINSTANCES straight to
zookeeper as an extra safety blanket. If it finds that it is NOT in the
liveinstances it likely means that its JVM participant lost its connection
to Zookeeper, but the process is still running so the bash script has not
terminated the node server. In this case the node server must end its own

Thank you for all your help.


On Wed, Jun 12, 2013 at 9:07 PM, kishore g <g.kishore@gmail.com> wrote:

> Hi Lance,
> Thanks for your interest in Helix. There are two possible approaches
> 1. Similar to what you suggested: Write a Helix Participant in non-jvm
> language which in your case is node.js. There seem to be quite a few
> implementations in node.js that can interact with zookeeper. Helix
> participant does the following ( you got it right but i am providing right
> sequence)
>    1. Create an ephemeral node under LIVEINSTANCES
>    2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for transitions
>    3. After transition is completed it updates
> Controller is doing most of the heavy lifting of ensuring that these
> transitions lead to the desired configuration. Its quite easy to
> re-implement this in any other language, the most difficult thing would be
> zookeeper binding. We have used java bindings and its solid.
> This is at a very high level, there are some more details I have left out
> like handling connection loss/session expiry etc that will require some
> thinking.
> 2. The other option is to use the Helix-agent as a proxy: We added Helix
> agent as part of 0.6.1, we havent documented it yet. Here is the gist of
> what it does. Think of it as a generic state transition handler. You can
> configure Helix to run a specific system command as part of each
> transition. Helix agent is a separate process that runs along side your
> actual process. Instead of the actual process getting the transition, Helix
> Agent gets the transition. As part of this transition the Helix agent can
> invoke api's on the actual process via RPC, HTTP etc. Helix agent simply
> acts as a proxy to the actual process.
> I have another approach and will try to write it up tonight, but before
> that I have few questions
>    1. How many node.js servers run on each node one or >1
>    2. Spectator/router is java or non java based ?
>    3. Can you provide more details about your state machine.
> thanks,
> Kishore G
> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh <lance@box.com> wrote:
>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did a
>> tremendous job with Helix. We are looking to use it to manage a cluster
>> primarily running Node.js. Our model for using Helix would be to have
>> node.js or some other non-JVM library be *Participants*, a router as a *
>> Spectator* and another set of machines to serve as the *Controllers *(pending
>> testing we may just run master-slave controllers on the same instances as
>> the Participants) . The participants will be interacting with Zookeeper in
>> two ways, one is to receive helix state transition messages through the
>> instance of the HelixManager <Participant>, and another is to directly
>> interact with Zookeeper just to maintain ephemeral nodes within /INSTANCES.
>> Maintaining ephemeral nodes directly to Zookeeper would be done instead of
>> using InstanceConfig and calling addInstance on HelixAdmin because of the
>> basic health checking baked into maintaining ephemeral nodes. If not we
>> would then have to write a health checker from Node.js and the JVM running
>> the Participant. Are there better alternatives for non-JVM Helix
>> participants? I corresponded with Kishore briefly and he mentioned
>> HelixAgents specifically ProcessMonitorThread that came out in the last
>> release.
>> Thank you very much!
>>  Lance Co Ting Keh

View raw message