I'm working with Trident and building a Trident Transactional topology using the https://github.com/wurstmeister/storm-kafka-0.8-plus/ Kafka spout.

I was wondering what is the best approach to calculating the Config.maxSpoutPending() value for Trident? Quoting the Storm FAQ:

"Start with a max spout pending that is for sure too small -- one for trident, or the number of executors for storm -- and increase it until you stop seeing changes in the flow. You'll probably end up with something near 2*(throughput in recs/sec)*(end-to-end latency) (2x the Little's law capacity)."

My topology has an average throughput of 2000 tuples/s and an end-to-end latency of 50 ms. Given that formula, and assuming the latency is specified in milliseconds, my max spout pending value should be 2 * 2000 tuples/s * 0.050 s = 200 tuples.

Is that formula correct? How do I best verify if the value is optimal for the topology? Also, what approach to setting the "Trident emit batch interval millis" value would you recommend?


Danijel Schiavuzzi

E: danijel@schiavuzzi.com
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7