helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Co Ting Keh <la...@box.com>
Subject Re: General Architecture built around Helix
Date Sat, 15 Jun 2013 03:37:19 GMT
Thank you for your advise Santiago. That is certainly part of the design as
well.


Best,
Lance


On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez <santip@santip.com.ar>wrote:

> Helix user here (not developer) so take my words with a grain of salt.
>
> Regarding 6 you might want to consider the behavior of the node.js
> instance if that instance loses connection to zk, you'll probably want to
> kill it too, otherwise you could ignore the fact that the JVM lost the
> connection too.
>
> Regards,
> Santiago
>
>
> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh <lance@box.com> wrote:
>
>> We have a working prototype of basically something like #2 you proposed
>> above. We're using the standard helix participant, and on the @Transitions
>> of the state model send commands to node.js via Http.
>>
>> I want to run you through our general architecture to make sure we are
>> not violating anything on the Helix side. As a reminder, what we need to
>> guarantee is that an any given time one and only one node.js process is in
>> charge of a task.
>>
>> 1. A machine with N cores will have N (pending testing) node.js processes
>> running
>> 2. Associated with each of the N node processes are also N Helix
>> participants (separate JVM instances -- reason for this to come later)
>> 3. Separate helix controller will be running on the machine and will just
>> leader elect between machines.
>> 4. The spectator router will likely be HAProxy and thus a linux kernel
>> will run JVM to serve as Helix spectator
>> 5. The state machine for each will simply be ONLINEOFFLINE mode. (however
>> i do get error messages that say that i havent defined an OFFLINE to
>> DROPPED mode, i was going to ask you this but this is a minor detail
>> compared to the rest of the architecture)
>> 5. Simple Bash script will serve as a watch dog on each node.js and helix
>> participant pair. If any of the two are "dead" the other process must
>> immediately be SIGKILLED, hence the need for one JVM serving as Helix
>> Participant for every Node.js
>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight to
>> zookeeper as an extra safety blanket. If it finds that it is NOT in the
>> liveinstances it likely means that its JVM participant lost its connection
>> to Zookeeper, but the process is still running so the bash script has not
>> terminated the node server. In this case the node server must end its own
>> process.
>>
>> Thank you for all your help.
>>
>> Sincerely,
>> Lance
>>
>>
>>
>>
>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <g.kishore@gmail.com> wrote:
>>
>>> Hi Lance,
>>>
>>> Thanks for your interest in Helix. There are two possible approaches
>>>
>>> 1. Similar to what you suggested: Write a Helix Participant in non-jvm
>>> language which in your case is node.js. There seem to be quite a few
>>> implementations in node.js that can interact with zookeeper. Helix
>>> participant does the following ( you got it right but i am providing right
>>> sequence)
>>>
>>>    1. Create an ephemeral node under LIVEINSTANCES
>>>    2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for
>>>    transitions
>>>    3. After transition is completed it updates
>>>    /INSTANCES/<PARTICIPANT_NAME>/CURRENTSTATE
>>>
>>> Controller is doing most of the heavy lifting of ensuring that these
>>> transitions lead to the desired configuration. Its quite easy to
>>> re-implement this in any other language, the most difficult thing would be
>>> zookeeper binding. We have used java bindings and its solid.
>>> This is at a very high level, there are some more details I have left
>>> out like handling connection loss/session expiry etc that will require some
>>> thinking.
>>>
>>>
>>> 2. The other option is to use the Helix-agent as a proxy: We added Helix
>>> agent as part of 0.6.1, we havent documented it yet. Here is the gist of
>>> what it does. Think of it as a generic state transition handler. You can
>>> configure Helix to run a specific system command as part of each
>>> transition. Helix agent is a separate process that runs along side your
>>> actual process. Instead of the actual process getting the transition, Helix
>>> Agent gets the transition. As part of this transition the Helix agent can
>>> invoke api's on the actual process via RPC, HTTP etc. Helix agent simply
>>> acts as a proxy to the actual process.
>>>
>>> I have another approach and will try to write it up tonight, but before
>>> that I have few questions
>>>
>>>
>>>    1. How many node.js servers run on each node one or >1
>>>    2. Spectator/router is java or non java based ?
>>>    3. Can you provide more details about your state machine.
>>>
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh <lance@box.com>wrote:
>>>
>>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did a
>>>> tremendous job with Helix. We are looking to use it to manage a cluster
>>>> primarily running Node.js. Our model for using Helix would be to have
>>>> node.js or some other non-JVM library be *Participants*, a router as a
>>>> *Spectator* and another set of machines to serve as the *Controllers *(pending
>>>> testing we may just run master-slave controllers on the same instances as
>>>> the Participants) . The participants will be interacting with Zookeeper in
>>>> two ways, one is to receive helix state transition messages through the
>>>> instance of the HelixManager <Participant>, and another is to directly
>>>> interact with Zookeeper just to maintain ephemeral nodes within /INSTANCES.
>>>> Maintaining ephemeral nodes directly to Zookeeper would be done instead of
>>>> using InstanceConfig and calling addInstance on HelixAdmin because of the
>>>> basic health checking baked into maintaining ephemeral nodes. If not we
>>>> would then have to write a health checker from Node.js and the JVM running
>>>> the Participant. Are there better alternatives for non-JVM Helix
>>>> participants? I corresponded with Kishore briefly and he mentioned
>>>> HelixAgents specifically ProcessMonitorThread that came out in the last
>>>> release.
>>>>
>>>>
>>>> Thank you very much!
>>>>
>>>>  Lance Co Ting Keh
>>>>
>>>
>>>
>>
>

Mime
View raw message