helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayak Borkar <vbo...@yahoo.com>
Subject Re: Dynamically configuring instances
Date Tue, 26 Feb 2013 17:48:04 GMT
Hi Kishore,

Thanks for creating the JIRA. I will try to respond to this mail here, 
but please let me know if you would like to continue further discussion 
on the issue in the JIRA going forward.
> The reasoning behind having a consistent naming scheme is to provide a
> consistent mechanism of assigning partition to nodes even after restarts.
> This is important for stateful systems where we dont want to move the data

I see the need for stability in naming instances to a avoid complete 
reshuffle on cluster restart. However, IMO this is a consequence of 
Helix's design of having ZooKeeper be the single source of truth when 
the cluster is not running.

Let's say Helix had an alternate approach:

While the cluster is running, let's say that ZooKeeper is used as the 
source of truth regarding locations of partitions of resources. On the 
other hand, when the cluster starts up, say ZK starts with a clean slate 
that is incrementally populated as instances join the cluster based on 
partitions reported by each instance during the "join" process. After 
this point say Helix continued doing what it does today.

With this approach, instance names matter only while the cluster is 
running and has no stability requirements across restarts. However, this 
is a huge change for Helix and I am sure you guys probably thought about 
this as a possible direction - I would like to hear your thoughts on 
this topic.

> on restarts. Another (not really technical but more practical) reason is to
> avoid rogue instances connecting to the cluster with random id due to code
> bugs or misconfiguration.

I completely agree with the need to handle the rogue/misconfigured 
instances case.

> This requirement has come up multiple times at LinkedIn and on other
> threads. Will a feature  like auto create instance on join and delete on
> leave be help ful. We can have this flag set at cluster level when the
> cluster is created so we can throw exception if the flag is set is false
> and node is not already created.

While the above feature would be great for adding new instances with 
little configuration (and for zero-configuration while testing), there 
still needs a way to handle a loaded cluster restart without leading to 
a massive reshuffle.


View raw message