storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Alhat <aniket.al...@gmail.com>
Subject Re: How to efficiently store the intermediate result of a bolt, and so it can be replayed after the crashes?
Date Fri, 07 Feb 2014 04:56:47 GMT
I hope this helps

https://github.com/pict2014/storm-redis
On Feb 7, 2014 12:07 AM, "Cheng-Kang Hsieh (Andy)" <changun@cs.ucla.edu>
wrote:

> Sorry, I realized that question was badly written. Simply put, my question
> is that is there a recommended way to store the tuples emitted by a BOLT so
> that the tuples can be replayed after crash without repeating the process
> all the way up from the source spout? any advice would be appreciated.
> Thank you!
>
> Best,
> Andy
>
>
> On Tue, Feb 4, 2014 at 11:58 AM, Cheng-Kang Hsieh (Andy) <
> changun@cs.ucla.edu> wrote:
>
>> Hi all,
>>
>> First of all, Thank Nathan and all the contributors for pulling out such a
>> great framework! I am learning a lot, even just reading the discussion
>> threads.
>>
>> I am building a topology that contains one spout along with a chain of
>> bolts. (e.g. S -> A  -> B, where S is the spout, A, B are bolts.)
>>
>> When S emits a tuple, the next bolt A  will buffer the tuple in a DFS, and
>> compute some aggregated values when it has received a sufficient amount of
>> data and then emit the aggregation results to the next bolt B.
>>
>> Here comes my question, is there a recommended way to store the
>> intermediate results emitted by a bolt, so that when machine crashes, the
>> results can be replayed to the downstreaming bolts (i.e. bolt B)?
>>
>> One possible solution could be that: Don't keep any intermediate results,
>> but resort to the storm's ack framework, so that the raw data will be
>> replay from spout S when crash happened.
>>
>> However, this approach might not be appropriate in my case, as it might
>> take pretty long time (like a couple of hours) before bolt A has received
>> all the required data and emit the aggregated results, so that it will be
>> very expensive for ack framework to keep tracking that many tuples for
>> that
>> long.
>>
>> An alternative solution could be: *making bolt A also a spout* and keep
>> the
>> emitted data in a DFS queue. When a result has been acked, the bolt A
>> removes it from the queue.
>>
>> I am wondering if it is reasonable to make a task both bolt and spout at
>> the same time? or if there is any better approach to do so.
>>
>> Thank you!
>>
>> --
>> Cheng-Kang Hsieh
>> UCLA Computer Science PhD Student
>> M: (310) 990-4297
>> A: 3770 Keystone Ave. Apt 402,
>>      Los Angeles, CA 90034
>>
>
>
>
> --
> Cheng-Kang Hsieh
> UCLA Computer Science PhD Student
> M: (310) 990-4297
> A: 3770 Keystone Ave. Apt 402,
>      Los Angeles, CA 90034
>

Mime
View raw message