storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Software Dev <static.void....@gmail.com>
Subject Re: Implementing Real-Time Trending Topics in Storm
Date Tue, 01 Apr 2014 19:47:03 GMT
> Does that make sense?

Yes and no.

 In the example on your blog the RollingCountBolt is configured for 9
and 3 which I understand to mean: Emit the last 9 second rolling
window every 3 seconds. I just don't understand the 2 second emit
frequencies of the other bolts.

On Tue, Apr 1, 2014 at 11:20 AM, Michael G. Noll
<michael+storm@michael-noll.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> "Software Dev",
>
> in RollingCountBolt there are two *time* related settings:
>
> 1. The size (duration) of the sliding window itself.  In seconds.
> 2. The time interval at which the latest sliding window count is sent
> to downstream bolts.  In seconds.
>
> See details here:
> https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/bolt/RollingCountBolt.java
>
> I'm quoting from the code above:
>
> "The bolt is configured by two parameters, the length of the sliding
> window in seconds (which influences the output data of the bolt, i.e.
> how it will count objects) and the emit frequency in seconds (which
> influences how often the bolt will output the latest window counts).
> For instance, if the window length is set to an equivalent of five
> minutes and the emit frequency to one minute, then the bolt will
> output the latest five-minute sliding window every minute."
>
>
>> Does this mean that the rolling counts for the last 9 events are
>> ranked and emitted every 2 seconds? 7 seconds
>
> The RollingCountBolt "thinks" in seconds.  However, behind the scenes
> RollingCountBolt uses SlidingWindowCounter [1], which in turn is built
> upon SlotBasedCounter [2].  Both the SlidingWindowCounter and the
> SlotBasedCounter don't know anything about time or durations (no
> seconds, minutes, and such).  This is by design, as it decouples the
> responsibility of counting (SlidingWindowCounter/SlotBasedCounter)
> from the responsibility of tracking the time (RollingCountBolt).
>
> The Apache Spark project has exactly the same notion of
> emitFrequencyInSeconds and windowLengthInSeconds, which they call
> slideInterval and windowLength.  See
> https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html.
>  They also have a similar diagram to what I showed in [3] that
> explains the idea behind sliding windows, see section "Window
> Operations" in the Spark link above.
>
>
> Does that make sense?
> Michael
>
>
>
> [1]
> https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlidingWindowCounter.java
> [2]
> https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlotBasedCounter.java
> [3]
> http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
>
>
> On 01.04.2014 18:45, Software Dev wrote:
>> In the article
>> (http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)
>>
>>
> and I was wondering what the rationale was for the emit frequencies
>> and how they all relate to each other.
>>
>> In the example the RollingCountBolt emits every 3 seconds,
>> IntermediateRankingBolt every 2 seconds and TotalRankingBolt every
>> 2 seconds. Does this mean that the rolling counts for the last 9
>> events are ranked and emitted every 2 seconds? 7 seconds? A little
>> confused.
>>
>> Thanks
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (MingW32)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlM7A2kACgkQeW5XuG18ujR93wCdHE6Ldu01fRgnMqjIi7chVMbu
> uEMAnjUyrZQq0xkg2REUzbgvk31A85Dm
> =YI7Y
> -----END PGP SIGNATURE-----

Mime
View raw message