storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauro Giusti <mau...@microsoft.com>
Subject RE: A Batching Bolt
Date Mon, 20 Nov 2017 16:46:26 GMT
Marco –
Our first bolt emits a summarized record of the info we received from the spouts –
It is time based – every 30 seconds we emit one record that summarizes all the records we
received from the spout –
We don’t re-emit the source records that we received from the spouts, they are persisted
on cold path storage though and we can access them offline for detailed analysis -

Is this similar to what you are trying to do?

Thx,
Mauro.

From: Marco Costantini [mailto:mcsilvio@gmail.com]
Sent: Monday, November 20, 2017 1:01 AM
To: user@storm.apache.org
Subject: A Batching Bolt

Hello,
I need to group/batch tuples. I've seen an excellent tutorial which does this. It handles
timeouts and batch size breaches. Great. However, there, all of the logic takes place in the
final bolt. That means it does not have the problem of "emitting batched information".

Sadly for me, I want to create a distinct bolt in the middle of a topology for batching. This
means I have to worry about emitting batches of information.

I tried it out. Both with the batching done in the final bolt, and with the batching done
in a separate bolt. When it's done in the final bolt, all is well. When it's done in a separate
bolt, performance suffers greatly. By this I mean the indexing rate of ElasticSearch (probably
not a good measure of performance, I know). The batching method is the same in both cases.

Question: Is it bad to emit a Map or a List of objects? What are the best practices for batching
in a distinct batching bolt?

Please and thank you,
Marco.
Mime
View raw message