helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Co Ting Keh <la...@box.com>
Subject Re: General Architecture built around Helix
Date Tue, 18 Jun 2013 01:15:07 GMT
Thank you kishore. I'll definitely try the memory consumption of one JVM
per node.js server first. If its too much we'll likely do your proposed
design but execute kills via the OS. This is to ensure no rogue servers.

I have a small implementation question. when calling new ZkHelixAdmin, when
it fails it retries again and again infinitely. (val admin = new
ZKHelixAdmin("")) is there a method I can override to limit the number of
reconnects and just have it fail?


On Sun, Jun 16, 2013 at 11:56 PM, kishore g <g.kishore@gmail.com> wrote:

> Hi Lance,
> Looks good to me. Having a JVM per node.js server might add additional
> over head, you should definitely run this with production configuration and
> ensure that it does not impact performanace. If you find it consuming too
> many resources, you can probably try this approach.
>    1. Have one agent per node
>    2. Instead of creating a separate helix agent per node.js, you can
>    create a multiple participants within the same agent. Each participant will
>    represents node.js process.
>    3. The monitoring of participant LIVEINSTANCE and killing of node.js
>    process can be done by one of the helix agents. You create an another
>    resource using leader-standby model. Only one helix agent will be the
>    leader and it will monitor the LIVEINSTANCES and if any Helix Agent dies it
>    can ask node.js servers to kill itself( you can use http or any other
>    mechanism of your choice). The idea here is to designate one leader in the
>    system to ensure that helix-agent and node.js act like a pair.
> You can try this only if you find that overhead of JVM is significant with
> the approach you have listed.
> Thanks,
> Kishore G
> On Fri, Jun 14, 2013 at 8:37 PM, Lance Co Ting Keh <lance@box.com> wrote:
>> Thank you for your advise Santiago. That is certainly part of the design
>> as well.
>> Best,
>> Lance
>> On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez <santip@santip.com.ar>wrote:
>>> Helix user here (not developer) so take my words with a grain of salt.
>>> Regarding 6 you might want to consider the behavior of the node.js
>>> instance if that instance loses connection to zk, you'll probably want to
>>> kill it too, otherwise you could ignore the fact that the JVM lost the
>>> connection too.
>>> Regards,
>>> Santiago
>>> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh <lance@box.com>wrote:
>>>> We have a working prototype of basically something like #2 you proposed
>>>> above. We're using the standard helix participant, and on the @Transitions
>>>> of the state model send commands to node.js via Http.
>>>> I want to run you through our general architecture to make sure we are
>>>> not violating anything on the Helix side. As a reminder, what we need to
>>>> guarantee is that an any given time one and only one node.js process is in
>>>> charge of a task.
>>>> 1. A machine with N cores will have N (pending testing) node.js
>>>> processes running
>>>> 2. Associated with each of the N node processes are also N Helix
>>>> participants (separate JVM instances -- reason for this to come later)
>>>> 3. Separate helix controller will be running on the machine and will
>>>> just leader elect between machines.
>>>> 4. The spectator router will likely be HAProxy and thus a linux kernel
>>>> will run JVM to serve as Helix spectator
>>>> 5. The state machine for each will simply be ONLINEOFFLINE mode.
>>>> (however i do get error messages that say that i havent defined an OFFLINE
>>>> to DROPPED mode, i was going to ask you this but this is a minor detail
>>>> compared to the rest of the architecture)
>>>> 5. Simple Bash script will serve as a watch dog on each node.js and
>>>> helix participant pair. If any of the two are "dead" the other process must
>>>> immediately be SIGKILLED, hence the need for one JVM serving as Helix
>>>> Participant for every Node.js
>>>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight to
>>>> zookeeper as an extra safety blanket. If it finds that it is NOT in the
>>>> liveinstances it likely means that its JVM participant lost its connection
>>>> to Zookeeper, but the process is still running so the bash script has not
>>>> terminated the node server. In this case the node server must end its own
>>>> process.
>>>> Thank you for all your help.
>>>> Sincerely,
>>>> Lance
>>>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <g.kishore@gmail.com> wrote:
>>>>> Hi Lance,
>>>>> Thanks for your interest in Helix. There are two possible approaches
>>>>> 1. Similar to what you suggested: Write a Helix Participant in non-jvm
>>>>> language which in your case is node.js. There seem to be quite a few
>>>>> implementations in node.js that can interact with zookeeper. Helix
>>>>> participant does the following ( you got it right but i am providing
>>>>> sequence)
>>>>>    1. Create an ephemeral node under LIVEINSTANCES
>>>>>    2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for
>>>>>    transitions
>>>>>    3. After transition is completed it updates
>>>>> Controller is doing most of the heavy lifting of ensuring that these
>>>>> transitions lead to the desired configuration. Its quite easy to
>>>>> re-implement this in any other language, the most difficult thing would
>>>>> zookeeper binding. We have used java bindings and its solid.
>>>>> This is at a very high level, there are some more details I have left
>>>>> out like handling connection loss/session expiry etc that will require
>>>>> thinking.
>>>>> 2. The other option is to use the Helix-agent as a proxy: We added
>>>>> Helix agent as part of 0.6.1, we havent documented it yet. Here is the
>>>>> of what it does. Think of it as a generic state transition handler. You
>>>>> configure Helix to run a specific system command as part of each
>>>>> transition. Helix agent is a separate process that runs along side your
>>>>> actual process. Instead of the actual process getting the transition,
>>>>> Agent gets the transition. As part of this transition the Helix agent
>>>>> invoke api's on the actual process via RPC, HTTP etc. Helix agent simply
>>>>> acts as a proxy to the actual process.
>>>>> I have another approach and will try to write it up tonight, but
>>>>> before that I have few questions
>>>>>    1. How many node.js servers run on each node one or >1
>>>>>    2. Spectator/router is java or non java based ?
>>>>>    3. Can you provide more details about your state machine.
>>>>> thanks,
>>>>> Kishore G
>>>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did a
>>>>>> tremendous job with Helix. We are looking to use it to manage a cluster
>>>>>> primarily running Node.js. Our model for using Helix would be to
>>>>>> have node.js or some other non-JVM library be *Participants*, a
>>>>>> router as a *Spectator* and another set of machines to serve as the
>>>>>> Controllers *(pending testing we may just run master-slave
>>>>>> controllers on the same instances as the Participants) . The participants
>>>>>> will be interacting with Zookeeper in two ways, one is to receive
>>>>>> state transition messages through the instance of the HelixManager
>>>>>> <Participant>, and another is to directly interact with Zookeeper
just to
>>>>>> maintain ephemeral nodes within /INSTANCES. Maintaining ephemeral
>>>>>> directly to Zookeeper would be done instead of using InstanceConfig
>>>>>> calling addInstance on HelixAdmin because of the basic health checking
>>>>>> baked into maintaining ephemeral nodes. If not we would then have
to write
>>>>>> a health checker from Node.js and the JVM running the Participant.
>>>>>> there better alternatives for non-JVM Helix participants? I corresponded
>>>>>> with Kishore briefly and he mentioned HelixAgents specifically
>>>>>> ProcessMonitorThread that came out in the last release.
>>>>>> Thank you very much!
>>>>>>  Lance Co Ting Keh

View raw message