incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Neumeyer <>
Subject Fwd: Few questions.
Date Wed, 06 Jun 2012 15:48:52 GMT
Posting to the list with permission...

---------- Forwarded message ----------
From: Matthieu Morel <>
Date: Wed, Jun 6, 2012 at 7:46 AM
Subject: Re: Few questions.

Hi Shailendra,

please don't hesitate to post on the public list, that will be useful
for everyone!

About partitioning:
- you partition data using a KeyFinder. See for example in the twitter example:;a=blob;f=test-apps/twitter-counter/src/main/java/org/apache/s4/example/twitter/;h=90c31994e20cc311e333ea8eb6bd1485e8b2e857;hb=S4-22#l46
- right now, if you use an adapter application in front of a consumer
application, events are broadcasted to all consumer nodes. Maybe
that's what is giving you issues. We'll add a customizable policy,
round-robin being probably the default.

About windowing:
- the idea is that you fill a circular and rotating buffer with slots
(in piper, you provide your own implementation), upon reception of
- you always have access to the latest slot, and you place data in that slot
- you define when new slots are generated
- you specify the size of a window, i.e. how many slots per window

In parallel, you can use a trigger to output data that you compute
from data in the current window. (that trigger could actually be a
multiple of slot duration)

We'll add examples and documentation for that.

Hope this helps, and thanks again for the feedback!


On Wed, Jun 6, 2012 at 3:15 PM, <> wrote:
> Hi Leo, Matthieu:
> Sorry couldn't attend yesterdays hangout session, I was on a plane. I have been trying
to code a few quant use cases using s4 and have a few questions:
> - Consider the following topology Input-Adaptor -> PE1, PE2, PE3 <all three are
running the same application> -> PrintPE <which outputs the data>
> Ideally, I would like to think PE1..3 as processing specific partition of the data, but
looks like there is no obvious way to do it. So, I thought I would filter out stuff at the
destination based on a partition-id. Now I can interrogate ZK and get my process partition
(haven't tried that but think it is possible). Short of that is there a cheaper way of doing
this. Maybe this is not a suitable way in S4, assuming that to be true - let me ask the question
how would you partition data ?
> - Now for the second question, so far for my applications I have been using onTime, onTrigger
methods to implement windowing. The former to do wall clock time based and the latter to do
application time based. However, I came across the notion of WindowingPE which could be used
instead. Would you have an example showing the use of WindowingPE to model what I have been
doing using onTime, onTrigger.
> Would greatly appreciate any help.
> - Thanks
> - Shailendra
> This email was sent to you by Thomson Reuters, the global news and information company.
Any views expressed in this message are those of the individual sender, except where the sender
specifically states them to be the views of Thomson Reuters.


Leo Neumeyer (@leoneu)

View raw message