helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: General Architecture built around Helix
Date Tue, 18 Jun 2013 16:13:50 GMT
My bad, i dint realize that you needed helixadmin to actually create the
cluster.  Please file a bug, fix it quite simple.

thanks,
Kishore G


On Tue, Jun 18, 2013 at 9:00 AM, Lance Co Ting Keh <lance@box.com> wrote:

> Thanks Kishore. Would you like me to file a bug fix for the first
> solution?
>
> Also with the use of the factory, i get the following error message:
> [error] org.apache.helix.HelixException: Initial cluster structure is not
> set up for cluster: dev-box-cluster
>
> Seems it did not create the appropriate zNodes for me. was there something
> i was suppose to initialize before calling the factory?
>
> Thank you
> Lance
>
>
>
>
>
> On Mon, Jun 17, 2013 at 8:09 PM, kishore g <g.kishore@gmail.com> wrote:
>
>> Hi Lance,
>>
>> Looks like we are not setting the connection timeout while connecting to
>> zookeeper in zkHelixAdmin.
>>
>> Fix is to change line 99 in ZkHelixAdmin.java   _zkClient = newZkClient(zkAddress);
to
>> _zkClient = new ZkClient(zkAddress, timeout* 1000);
>>
>> Another workaround is to use HelixManager to get HelixAdmin
>>
>> manager = HelixManagerFactory.getZKHelixManager(cluster, "Admin",
>> InstanceType.ADMINISTRATOR, zkAddress);
>> manager.connect();
>> admin= manager. getClusterManagmentTool();
>>
>> This will wait for 60 seconds before failing.
>> Thanks,
>> Kishore G
>>
>>
>> On Mon, Jun 17, 2013 at 6:15 PM, Lance Co Ting Keh <lance@box.com> wrote:
>>
>>> Thank you kishore. I'll definitely try the memory consumption of one JVM
>>> per node.js server first. If its too much we'll likely do your proposed
>>> design but execute kills via the OS. This is to ensure no rogue servers.
>>>
>>> I have a small implementation question. when calling new ZkHelixAdmin,
>>> when it fails it retries again and again infinitely. (val admin = new
>>> ZKHelixAdmin("")) is there a method I can override to limit the number of
>>> reconnects and just have it fail?
>>>
>>>
>>>
>>> Lance
>>>
>>>
>>> On Sun, Jun 16, 2013 at 11:56 PM, kishore g <g.kishore@gmail.com> wrote:
>>>
>>>> Hi Lance,
>>>>
>>>> Looks good to me. Having a JVM per node.js server might add additional
>>>> over head, you should definitely run this with production configuration and
>>>> ensure that it does not impact performanace. If you find it consuming too
>>>> many resources, you can probably try this approach.
>>>>
>>>>    1. Have one agent per node
>>>>    2. Instead of creating a separate helix agent per node.js, you can
>>>>    create a multiple participants within the same agent. Each participant
will
>>>>    represents node.js process.
>>>>    3. The monitoring of participant LIVEINSTANCE and killing of
>>>>    node.js process can be done by one of the helix agents. You create an
>>>>    another resource using leader-standby model. Only one helix agent will
be
>>>>    the leader and it will monitor the LIVEINSTANCES and if any Helix Agent
>>>>    dies it can ask node.js servers to kill itself( you can use http or any
>>>>    other mechanism of your choice). The idea here is to designate one leader
>>>>    in the system to ensure that helix-agent and node.js act like a pair.
>>>>
>>>> You can try this only if you find that overhead of JVM is significant
>>>> with the approach you have listed.
>>>>
>>>> Thanks,
>>>> Kishore G
>>>>
>>>>
>>>> On Fri, Jun 14, 2013 at 8:37 PM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>
>>>>> Thank you for your advise Santiago. That is certainly part of the
>>>>> design as well.
>>>>>
>>>>>
>>>>> Best,
>>>>> Lance
>>>>>
>>>>>
>>>>> On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez <santip@santip.com.ar>wrote:
>>>>>
>>>>>> Helix user here (not developer) so take my words with a grain of
salt.
>>>>>>
>>>>>> Regarding 6 you might want to consider the behavior of the node.js
>>>>>> instance if that instance loses connection to zk, you'll probably
want to
>>>>>> kill it too, otherwise you could ignore the fact that the JVM lost
the
>>>>>> connection too.
>>>>>>
>>>>>> Regards,
>>>>>> Santiago
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>>>
>>>>>>> We have a working prototype of basically something like #2 you
>>>>>>> proposed above. We're using the standard helix participant, and
on the
>>>>>>> @Transitions of the state model send commands to node.js via
Http.
>>>>>>>
>>>>>>> I want to run you through our general architecture to make sure
we
>>>>>>> are not violating anything on the Helix side. As a reminder,
what we need
>>>>>>> to guarantee is that an any given time one and only one node.js
process is
>>>>>>> in charge of a task.
>>>>>>>
>>>>>>> 1. A machine with N cores will have N (pending testing) node.js
>>>>>>> processes running
>>>>>>> 2. Associated with each of the N node processes are also N Helix
>>>>>>> participants (separate JVM instances -- reason for this to come
later)
>>>>>>> 3. Separate helix controller will be running on the machine and
will
>>>>>>> just leader elect between machines.
>>>>>>> 4. The spectator router will likely be HAProxy and thus a linux
>>>>>>> kernel will run JVM to serve as Helix spectator
>>>>>>> 5. The state machine for each will simply be ONLINEOFFLINE mode.
>>>>>>> (however i do get error messages that say that i havent defined
an OFFLINE
>>>>>>> to DROPPED mode, i was going to ask you this but this is a minor
detail
>>>>>>> compared to the rest of the architecture)
>>>>>>> 5. Simple Bash script will serve as a watch dog on each node.js
and
>>>>>>> helix participant pair. If any of the two are "dead" the other
process must
>>>>>>> immediately be SIGKILLED, hence the need for one JVM serving
as Helix
>>>>>>> Participant for every Node.js
>>>>>>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight
to
>>>>>>> zookeeper as an extra safety blanket. If it finds that it is
NOT in the
>>>>>>> liveinstances it likely means that its JVM participant lost its
connection
>>>>>>> to Zookeeper, but the process is still running so the bash script
has not
>>>>>>> terminated the node server. In this case the node server must
end its own
>>>>>>> process.
>>>>>>>
>>>>>>> Thank you for all your help.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Lance
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <g.kishore@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Lance,
>>>>>>>>
>>>>>>>> Thanks for your interest in Helix. There are two possible
approaches
>>>>>>>>
>>>>>>>> 1. Similar to what you suggested: Write a Helix Participant
in
>>>>>>>> non-jvm language which in your case is node.js. There seem
to be quite a
>>>>>>>> few implementations in node.js that can interact with zookeeper.
Helix
>>>>>>>> participant does the following ( you got it right but i am
providing right
>>>>>>>> sequence)
>>>>>>>>
>>>>>>>>    1. Create an ephemeral node under LIVEINSTANCES
>>>>>>>>    2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES
node for
>>>>>>>>    transitions
>>>>>>>>    3. After transition is completed it updates
>>>>>>>>    /INSTANCES/<PARTICIPANT_NAME>/CURRENTSTATE
>>>>>>>>
>>>>>>>> Controller is doing most of the heavy lifting of ensuring
that
>>>>>>>> these transitions lead to the desired configuration. Its
quite easy to
>>>>>>>> re-implement this in any other language, the most difficult
thing would be
>>>>>>>> zookeeper binding. We have used java bindings and its solid.
>>>>>>>> This is at a very high level, there are some more details
I have
>>>>>>>> left out like handling connection loss/session expiry etc
that will require
>>>>>>>> some thinking.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. The other option is to use the Helix-agent as a proxy:
We added
>>>>>>>> Helix agent as part of 0.6.1, we havent documented it yet.
Here is the gist
>>>>>>>> of what it does. Think of it as a generic state transition
handler. You can
>>>>>>>> configure Helix to run a specific system command as part
of each
>>>>>>>> transition. Helix agent is a separate process that runs along
side your
>>>>>>>> actual process. Instead of the actual process getting the
transition, Helix
>>>>>>>> Agent gets the transition. As part of this transition the
Helix agent can
>>>>>>>> invoke api's on the actual process via RPC, HTTP etc. Helix
agent simply
>>>>>>>> acts as a proxy to the actual process.
>>>>>>>>
>>>>>>>> I have another approach and will try to write it up tonight,
but
>>>>>>>> before that I have few questions
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. How many node.js servers run on each node one or >1
>>>>>>>>    2. Spectator/router is java or non java based ?
>>>>>>>>    3. Can you provide more details about your state machine.
>>>>>>>>
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Kishore G
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>>>>>
>>>>>>>>> Hi my name is Lance Co Ting Keh and I work at Box. You
guys did a
>>>>>>>>> tremendous job with Helix. We are looking to use it to
manage a cluster
>>>>>>>>> primarily running Node.js. Our model for using Helix
would be to
>>>>>>>>> have node.js or some other non-JVM library be *Participants*,
a
>>>>>>>>> router as a *Spectator* and another set of machines to
serve as
>>>>>>>>> the *Controllers *(pending testing we may just run master-slave
>>>>>>>>> controllers on the same instances as the Participants)
. The participants
>>>>>>>>> will be interacting with Zookeeper in two ways, one is
to receive helix
>>>>>>>>> state transition messages through the instance of the
HelixManager
>>>>>>>>> <Participant>, and another is to directly interact
with Zookeeper just to
>>>>>>>>> maintain ephemeral nodes within /INSTANCES. Maintaining
ephemeral nodes
>>>>>>>>> directly to Zookeeper would be done instead of using
InstanceConfig and
>>>>>>>>> calling addInstance on HelixAdmin because of the basic
health checking
>>>>>>>>> baked into maintaining ephemeral nodes. If not we would
then have to write
>>>>>>>>> a health checker from Node.js and the JVM running the
Participant. Are
>>>>>>>>> there better alternatives for non-JVM Helix participants?
I corresponded
>>>>>>>>> with Kishore briefly and he mentioned HelixAgents specifically
>>>>>>>>> ProcessMonitorThread that came out in the last release.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you very much!
>>>>>>>>>
>>>>>>>>>  Lance Co Ting Keh
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message