incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Vavassori <>
Subject Information about Apache S4 and Helix
Date Mon, 02 Dec 2013 16:47:07 GMT
Good morning,

I have started using Apache S4 for a university project and I wanted to ask
you some question about its architecture, mainly to be sure to do the
modifications I need in the right way and to see if there is a cleaner and
simpler one.

It's my understanding that a cluster is a group of nodes and each node has
the same application-code copy; this means that if I want to partition the
ProcessingElements between nodes I need to group them in different clusters.
So, mapping S4 elements into a "classical" Stream Processing naming (Nodes,
Operators, Slides...), would be having one application (Operator) per
cluster and configure the ProcessingElements as singleton (1 Slide per

About inter-cluster streaming:
Is it possible to have broadcast stream between one cluster and all nodes
of another cluster? Or should I re-implement RemoteSenders to do that? In
this last case, is there a way to unbind Module mappring between interface
and class used to resolve @inject?
Is there a way to have that feature as per-stream configuration rather than
all-stream cluster-wide?
Is there any functional difference (or limitation) between "RemoteStreams"
and "Streams" beyond the naming to recognize inter-cluster vs intra-cluster

I saw there is an ongoing integration with helix project, which has a
slightly different concept for partition since it can host more than one on
the same node, but I couldn't find any example. Is there any work on it?

Sergio Vavassori

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message