incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <>
Subject Using Helix for cluster management of S4
Date Fri, 30 Nov 2012 06:28:27 GMT

I have added some additional features(mostly related to cluster management)
to S4 using Helix. Helix is generic cluster management framework that
entered Apache Incubation in October. In few words, Helix provides
automatic partition management, failure detection and handling and cluster
expansion in distributed systems.

In S4 the number of partition is fixed for all streams and is dependent on
the number of nodes in the cluster.  Adding new nodes to S4 cluster causes
the number of partitions to change. This results in lot of data movement.
For example if there are 4 nodes and you add another node then nearly all
keys will be remapped which result is huge data movement where as ideally
only 20% of the data should move.

By using Helix, every stream can be  partitioned differently and
independent of the number of nodes. Helix distributes the partitions evenly
among the nodes. When new nodes are added, partitions can be migrated to
new nodes without changing the number of partitions and  minimizes the data

In S4 handles failures by having stand by nodes that are idle most of the
time and become active when a node fails. Even though this works, its not
ideal in terms of efficient hardware usage since the stand by nodes are
idle most of the time. This also increases the fail over time since the PE
state has to be transfered to only one node.

Helix allows S4 to have Active and Standby nodes at a partition level so
that all nodes can be active but some partitions will be Active and some in
stand by mode. When a node fails, the partitions that were  Active on that
node will be evenly distributed among the remaining nodes. This provides
automatic load balancing and also improves fail over time, since PE state
can be transfered to multiple nodes in parallel.

I have a prototype implementation here

Instructions to build it and try it out are in the Readme.

More info on Helix can be found here,

Helix can provide lot of other functionalities like

   1. Configure the topology according to use case. For example, co-locate
   the partitions of different streams to allow efficient joins. Configure the
   number of standby for each partition based on the head room available.
   2. When new nodes are added, it can throttle the data movement
   3. Comes with large set of admin tools like enable/disable node,
   dynamically change the topology etc. Provides a rest interface to manage
   the cluster.
   4. Allows one to schedule custom tasks like snapshot the PE's in a
   partition and restore from the snapshot.

Would like to get your feedback.

Kishore G

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message