helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: General Architecture built around Helix
Date Mon, 17 Jun 2013 06:56:01 GMT
Hi Lance,

Looks good to me. Having a JVM per node.js server might add additional over
head, you should definitely run this with production configuration and
ensure that it does not impact performanace. If you find it consuming too
many resources, you can probably try this approach.

   1. Have one agent per node
   2. Instead of creating a separate helix agent per node.js, you can
   create a multiple participants within the same agent. Each participant will
   represents node.js process.
   3. The monitoring of participant LIVEINSTANCE and killing of node.js
   process can be done by one of the helix agents. You create an another
   resource using leader-standby model. Only one helix agent will be the
   leader and it will monitor the LIVEINSTANCES and if any Helix Agent dies it
   can ask node.js servers to kill itself( you can use http or any other
   mechanism of your choice). The idea here is to designate one leader in the
   system to ensure that helix-agent and node.js act like a pair.

You can try this only if you find that overhead of JVM is significant with
the approach you have listed.

Kishore G

On Fri, Jun 14, 2013 at 8:37 PM, Lance Co Ting Keh <lance@box.com> wrote:

> Thank you for your advise Santiago. That is certainly part of the design
> as well.
> Best,
> Lance
> On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez <santip@santip.com.ar>wrote:
>> Helix user here (not developer) so take my words with a grain of salt.
>> Regarding 6 you might want to consider the behavior of the node.js
>> instance if that instance loses connection to zk, you'll probably want to
>> kill it too, otherwise you could ignore the fact that the JVM lost the
>> connection too.
>> Regards,
>> Santiago
>> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh <lance@box.com> wrote:
>>> We have a working prototype of basically something like #2 you proposed
>>> above. We're using the standard helix participant, and on the @Transitions
>>> of the state model send commands to node.js via Http.
>>> I want to run you through our general architecture to make sure we are
>>> not violating anything on the Helix side. As a reminder, what we need to
>>> guarantee is that an any given time one and only one node.js process is in
>>> charge of a task.
>>> 1. A machine with N cores will have N (pending testing) node.js
>>> processes running
>>> 2. Associated with each of the N node processes are also N Helix
>>> participants (separate JVM instances -- reason for this to come later)
>>> 3. Separate helix controller will be running on the machine and will
>>> just leader elect between machines.
>>> 4. The spectator router will likely be HAProxy and thus a linux kernel
>>> will run JVM to serve as Helix spectator
>>> 5. The state machine for each will simply be ONLINEOFFLINE mode.
>>> (however i do get error messages that say that i havent defined an OFFLINE
>>> to DROPPED mode, i was going to ask you this but this is a minor detail
>>> compared to the rest of the architecture)
>>> 5. Simple Bash script will serve as a watch dog on each node.js and
>>> helix participant pair. If any of the two are "dead" the other process must
>>> immediately be SIGKILLED, hence the need for one JVM serving as Helix
>>> Participant for every Node.js
>>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight to
>>> zookeeper as an extra safety blanket. If it finds that it is NOT in the
>>> liveinstances it likely means that its JVM participant lost its connection
>>> to Zookeeper, but the process is still running so the bash script has not
>>> terminated the node server. In this case the node server must end its own
>>> process.
>>> Thank you for all your help.
>>> Sincerely,
>>> Lance
>>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <g.kishore@gmail.com> wrote:
>>>> Hi Lance,
>>>> Thanks for your interest in Helix. There are two possible approaches
>>>> 1. Similar to what you suggested: Write a Helix Participant in non-jvm
>>>> language which in your case is node.js. There seem to be quite a few
>>>> implementations in node.js that can interact with zookeeper. Helix
>>>> participant does the following ( you got it right but i am providing right
>>>> sequence)
>>>>    1. Create an ephemeral node under LIVEINSTANCES
>>>>    2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for
>>>>    transitions
>>>>    3. After transition is completed it updates
>>>> Controller is doing most of the heavy lifting of ensuring that these
>>>> transitions lead to the desired configuration. Its quite easy to
>>>> re-implement this in any other language, the most difficult thing would be
>>>> zookeeper binding. We have used java bindings and its solid.
>>>> This is at a very high level, there are some more details I have left
>>>> out like handling connection loss/session expiry etc that will require some
>>>> thinking.
>>>> 2. The other option is to use the Helix-agent as a proxy: We added
>>>> Helix agent as part of 0.6.1, we havent documented it yet. Here is the gist
>>>> of what it does. Think of it as a generic state transition handler. You can
>>>> configure Helix to run a specific system command as part of each
>>>> transition. Helix agent is a separate process that runs along side your
>>>> actual process. Instead of the actual process getting the transition, Helix
>>>> Agent gets the transition. As part of this transition the Helix agent can
>>>> invoke api's on the actual process via RPC, HTTP etc. Helix agent simply
>>>> acts as a proxy to the actual process.
>>>> I have another approach and will try to write it up tonight, but before
>>>> that I have few questions
>>>>    1. How many node.js servers run on each node one or >1
>>>>    2. Spectator/router is java or non java based ?
>>>>    3. Can you provide more details about your state machine.
>>>> thanks,
>>>> Kishore G
>>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did a
>>>>> tremendous job with Helix. We are looking to use it to manage a cluster
>>>>> primarily running Node.js. Our model for using Helix would be to have
>>>>> node.js or some other non-JVM library be *Participants*, a router as
>>>>> a *Spectator* and another set of machines to serve as the *
>>>>> Controllers *(pending testing we may just run master-slave
>>>>> controllers on the same instances as the Participants) . The participants
>>>>> will be interacting with Zookeeper in two ways, one is to receive helix
>>>>> state transition messages through the instance of the HelixManager
>>>>> <Participant>, and another is to directly interact with Zookeeper
just to
>>>>> maintain ephemeral nodes within /INSTANCES. Maintaining ephemeral nodes
>>>>> directly to Zookeeper would be done instead of using InstanceConfig and
>>>>> calling addInstance on HelixAdmin because of the basic health checking
>>>>> baked into maintaining ephemeral nodes. If not we would then have to
>>>>> a health checker from Node.js and the JVM running the Participant. Are
>>>>> there better alternatives for non-JVM Helix participants? I corresponded
>>>>> with Kishore briefly and he mentioned HelixAgents specifically
>>>>> ProcessMonitorThread that came out in the last release.
>>>>> Thank you very much!
>>>>>  Lance Co Ting Keh

View raw message