helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: General Architecture built around Helix
Date Tue, 18 Jun 2013 03:09:38 GMT
Hi Lance,

Looks like we are not setting the connection timeout while connecting to
zookeeper in zkHelixAdmin.

Fix is to change line 99 in ZkHelixAdmin.java   _zkClient =
newZkClient(zkAddress); to
_zkClient = new ZkClient(zkAddress, timeout* 1000);

Another workaround is to use HelixManager to get HelixAdmin

manager = HelixManagerFactory.getZKHelixManager(cluster, "Admin",
InstanceType.ADMINISTRATOR, zkAddress);
admin= manager. getClusterManagmentTool();

This will wait for 60 seconds before failing.
Kishore G

On Mon, Jun 17, 2013 at 6:15 PM, Lance Co Ting Keh <lance@box.com> wrote:

> Thank you kishore. I'll definitely try the memory consumption of one JVM
> per node.js server first. If its too much we'll likely do your proposed
> design but execute kills via the OS. This is to ensure no rogue servers.
> I have a small implementation question. when calling new ZkHelixAdmin,
> when it fails it retries again and again infinitely. (val admin = new
> ZKHelixAdmin("")) is there a method I can override to limit the number of
> reconnects and just have it fail?
> Lance
> On Sun, Jun 16, 2013 at 11:56 PM, kishore g <g.kishore@gmail.com> wrote:
>> Hi Lance,
>> Looks good to me. Having a JVM per node.js server might add additional
>> over head, you should definitely run this with production configuration and
>> ensure that it does not impact performanace. If you find it consuming too
>> many resources, you can probably try this approach.
>>    1. Have one agent per node
>>    2. Instead of creating a separate helix agent per node.js, you can
>>    create a multiple participants within the same agent. Each participant will
>>    represents node.js process.
>>    3. The monitoring of participant LIVEINSTANCE and killing of node.js
>>    process can be done by one of the helix agents. You create an another
>>    resource using leader-standby model. Only one helix agent will be the
>>    leader and it will monitor the LIVEINSTANCES and if any Helix Agent dies it
>>    can ask node.js servers to kill itself( you can use http or any other
>>    mechanism of your choice). The idea here is to designate one leader in the
>>    system to ensure that helix-agent and node.js act like a pair.
>> You can try this only if you find that overhead of JVM is significant
>> with the approach you have listed.
>> Thanks,
>> Kishore G
>> On Fri, Jun 14, 2013 at 8:37 PM, Lance Co Ting Keh <lance@box.com> wrote:
>>> Thank you for your advise Santiago. That is certainly part of the design
>>> as well.
>>> Best,
>>> Lance
>>> On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez <santip@santip.com.ar>wrote:
>>>> Helix user here (not developer) so take my words with a grain of salt.
>>>> Regarding 6 you might want to consider the behavior of the node.js
>>>> instance if that instance loses connection to zk, you'll probably want to
>>>> kill it too, otherwise you could ignore the fact that the JVM lost the
>>>> connection too.
>>>> Regards,
>>>> Santiago
>>>> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>> We have a working prototype of basically something like #2 you
>>>>> proposed above. We're using the standard helix participant, and on the
>>>>> @Transitions of the state model send commands to node.js via Http.
>>>>> I want to run you through our general architecture to make sure we are
>>>>> not violating anything on the Helix side. As a reminder, what we need
>>>>> guarantee is that an any given time one and only one node.js process
is in
>>>>> charge of a task.
>>>>> 1. A machine with N cores will have N (pending testing) node.js
>>>>> processes running
>>>>> 2. Associated with each of the N node processes are also N Helix
>>>>> participants (separate JVM instances -- reason for this to come later)
>>>>> 3. Separate helix controller will be running on the machine and will
>>>>> just leader elect between machines.
>>>>> 4. The spectator router will likely be HAProxy and thus a linux kernel
>>>>> will run JVM to serve as Helix spectator
>>>>> 5. The state machine for each will simply be ONLINEOFFLINE mode.
>>>>> (however i do get error messages that say that i havent defined an OFFLINE
>>>>> to DROPPED mode, i was going to ask you this but this is a minor detail
>>>>> compared to the rest of the architecture)
>>>>> 5. Simple Bash script will serve as a watch dog on each node.js and
>>>>> helix participant pair. If any of the two are "dead" the other process
>>>>> immediately be SIGKILLED, hence the need for one JVM serving as Helix
>>>>> Participant for every Node.js
>>>>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight to
>>>>> zookeeper as an extra safety blanket. If it finds that it is NOT in the
>>>>> liveinstances it likely means that its JVM participant lost its connection
>>>>> to Zookeeper, but the process is still running so the bash script has
>>>>> terminated the node server. In this case the node server must end its
>>>>> process.
>>>>> Thank you for all your help.
>>>>> Sincerely,
>>>>> Lance
>>>>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <g.kishore@gmail.com>wrote:
>>>>>> Hi Lance,
>>>>>> Thanks for your interest in Helix. There are two possible approaches
>>>>>> 1. Similar to what you suggested: Write a Helix Participant in
>>>>>> non-jvm language which in your case is node.js. There seem to be
quite a
>>>>>> few implementations in node.js that can interact with zookeeper.
>>>>>> participant does the following ( you got it right but i am providing
>>>>>> sequence)
>>>>>>    1. Create an ephemeral node under LIVEINSTANCES
>>>>>>    2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for
>>>>>>    transitions
>>>>>>    3. After transition is completed it updates
>>>>>> Controller is doing most of the heavy lifting of ensuring that these
>>>>>> transitions lead to the desired configuration. Its quite easy to
>>>>>> re-implement this in any other language, the most difficult thing
would be
>>>>>> zookeeper binding. We have used java bindings and its solid.
>>>>>> This is at a very high level, there are some more details I have
>>>>>> out like handling connection loss/session expiry etc that will require
>>>>>> thinking.
>>>>>> 2. The other option is to use the Helix-agent as a proxy: We added
>>>>>> Helix agent as part of 0.6.1, we havent documented it yet. Here is
the gist
>>>>>> of what it does. Think of it as a generic state transition handler.
You can
>>>>>> configure Helix to run a specific system command as part of each
>>>>>> transition. Helix agent is a separate process that runs along side
>>>>>> actual process. Instead of the actual process getting the transition,
>>>>>> Agent gets the transition. As part of this transition the Helix agent
>>>>>> invoke api's on the actual process via RPC, HTTP etc. Helix agent
>>>>>> acts as a proxy to the actual process.
>>>>>> I have another approach and will try to write it up tonight, but
>>>>>> before that I have few questions
>>>>>>    1. How many node.js servers run on each node one or >1
>>>>>>    2. Spectator/router is java or non java based ?
>>>>>>    3. Can you provide more details about your state machine.
>>>>>> thanks,
>>>>>> Kishore G
>>>>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did
>>>>>>> tremendous job with Helix. We are looking to use it to manage
a cluster
>>>>>>> primarily running Node.js. Our model for using Helix would be
>>>>>>> have node.js or some other non-JVM library be *Participants*,
>>>>>>> router as a *Spectator* and another set of machines to serve
as the
>>>>>>> *Controllers *(pending testing we may just run master-slave
>>>>>>> controllers on the same instances as the Participants) . The
>>>>>>> will be interacting with Zookeeper in two ways, one is to receive
>>>>>>> state transition messages through the instance of the HelixManager
>>>>>>> <Participant>, and another is to directly interact with
Zookeeper just to
>>>>>>> maintain ephemeral nodes within /INSTANCES. Maintaining ephemeral
>>>>>>> directly to Zookeeper would be done instead of using InstanceConfig
>>>>>>> calling addInstance on HelixAdmin because of the basic health
>>>>>>> baked into maintaining ephemeral nodes. If not we would then
have to write
>>>>>>> a health checker from Node.js and the JVM running the Participant.
>>>>>>> there better alternatives for non-JVM Helix participants? I corresponded
>>>>>>> with Kishore briefly and he mentioned HelixAgents specifically
>>>>>>> ProcessMonitorThread that came out in the last release.
>>>>>>> Thank you very much!
>>>>>>>  Lance Co Ting Keh

View raw message