incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: S4 and scalability problem
Date Thu, 19 Dec 2013 13:47:08 GMT
Hi,

as long as you don't saturate the network you should more or less increase linearly the overall
throughput by adding processing nodes.

There are quite a few parameters that may influence the throughput:
- input load skewed
- related to the above: small number of keys (S4 handles thousands or millions of keys. If
you just have a few, load probably won't be balanced)
- in a shared infrastructure like AWS: network traffic from others

About load shedding, I don't see how you'd send more data with that than with a blocking sender:
with that sender, anything that overflows the communication channel is just dropped. Are you
sure you are measuring the actual data output and not just what you pass to the sender?

If you consider throughput is low, one possible interpretation is that the processing of the
events is just costly. You may want to look at the "---pe-processing-time" metric that reports
the duration of processing a given event. 

Note that in maybe simpler configurations (not sure what you are doing), it's possible to
easily achieve 100 or 200k events/s/stream/node

Hope this helps,

Matthieu




On Dec 19, 2013, at 10:43 , Giacomo Fiorini <giacomo.fiorini@studio.unibo.it> wrote:

> Hi,
> i was testing my s4 application on a cluster of aws instances and i encountered a problem
when trying different configurations with different number of nodes.
> I use one node to produce events reading from an input file, this adapter node sends
events to a second cluster that processes them.
> I run a test setting Sender, RemoteSender and StreamExecutor in blocking mode (with a
custom module and -emc option when deploying both the adapter and the application)
> Changing the number of processing nodes in the cluster and measuring the events/s sent
from the adapter produced these results:
> -1 node: 3500 Events/s
> -2 nodes: 8600 Events/s
> -3 nodes: 9300 Events/s
> The 3 node configuration doesn' t seem to perform well as it should; im sure it is not
an adapter problem because i tried to add another adapter node and the total number of events/s
sent from both is the same as with one adapter only.
> Plus i've tried setting LoadShedding policy as default sending at a much higher rate
(20000 ev/s) and i measured that the Kb/s sent from the adapter are roughly the double of
the blocking mode.
> I also tried to increase s4.sender.parallelism and s4.remoteSender.parallelism in the
./deploy command but it hasn't solved my problem.
> Is there something else i should try?
> 
> Regards,
> Giacomo Fiorini


Mime
View raw message