incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Curious about extra things in 0.6 compared to 0.5
Date Mon, 22 Apr 2013 13:02:36 GMT
Great questions! See my answers inline.

On Apr 22, 2013, at 14:38 , saradindu kar wrote:

> Hi,
> 
> I followed s4 from s4-0.3 version, then 0.5 and now 0.6.
> 
> I have some experimental outcomes, which I want to clarify;
> 
> In 0.3 (from PE to PE) : no of events lost: YES ----- Mean Lag is:YES
> In 0.5 (from PE to PE) : no of events lost: "0"(No event Loss) ------ Mean Lag is: "Still
Lag is their but Less compare to 0.3"
> In 0.6 (from PE to PE) : no of events lost: "Starts with 10000, then it reduces up to
thousands " ------ Mean Lag is:"0"
> 
> So my doubt is as S4 evolves from its inception, what is your current primary goals to
address.

0.5 was a complete refactoring, with focus on providing a functional system with a new implementation
0.6 aims were to improve performance and usability / configurability.

> Can I deploy one system with 0 loss and 0 lag time or It is like based on my use-case
needs, I can choose 0.5 or 0.6.

In S4 0.6 we can define how to process events: blocking (no loss, but waits), shedding (drops
events when input (or output) rate is faster than processing rate), custom (maybe, depending
on the stream or some characteristic of the event itself).
By default, within an S4 app, when downstream PEs cannot process events sufficiently fast,
S4 drops events upstream. But if we use an adapter, by default, senders will wait until the
downstream app can process events.

> 
> For addressing above issue I felt Storm has upper hand over S4 but It has lesser performance,
in terms of no of events processed and processing speed but that can also improve using more
no of machines.
> 
> Is it correct, as Storm uses ZeroMQ, "kind of pulling technique", It uses for handling
incoming events. It doesn't incur above problem.
> Whereas S4 won't use ZeroMQ, if I understood correct it uses push technique for handling
incoming events, So it looses events for maintaining the queue.

That depends on how you configure the processing of the queues. By blocking upstream based
on back pressure from downstream, you can avoid losing events. Events won't be sent faster
than the downstream system can process them. 

Then it depends on your source of events. If you can pull from that source, then great, pull
code can be implemented in the adapter, then passed to the S4 app. If you cannot pull, you
can maintain some buffering, but you'll probably have to drop some events at some point, and
S4 provides facilities for that.


> 
> Can you give me some idea about concepts behind queue implementation Here.

More details here: http://incubator.apache.org/s4/doc/0.6.0/event_dispatch/ 

> 
> One more Query about Joining multiple streams, there is a provision for joining streams
in 0.3, did you have any provision here(0.6) for joining, splitting, any incoming streams
based on its key. Now we can do, writing a common event file for different Event streams.
We can use that for our processing in PEs. 
> If you have any way do this in 0.6, can you redirect to right API for this.

There is no such API / facility yet, so you have to implement the corresponding logic in the
code of the PE

Hope this helps,

Matthieu

> 
> Thanks,
> ~/Sk


Mime
View raw message