hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Harris <>
Subject RE: Spark Streaming, Batch interval, Windows length and Sliding Interval settings
Date Thu, 05 May 2016 18:35:57 GMT
This is really outside of the scope of Hive and would probably be better addressed by the Spark
community, however I can say that this very much depends on your use case....

Take a look at this discussion if you haven't already:!topic/spark-users/GQoxJHAAtX4

Generally speaking, the larger the batch window, the better the overall performance, but the
streaming data output will be updated less will likely run into problems
setting your batch window < 0.5 sec, and/or when the batch window < the amount of time
it takes to run the task....

Beyond that, the window length and sliding interval need to be multiples of the batch window,
but will depend entirely on your reporting requirements.

it would be perfectly reasonable to have
batch window = 30 secs
window length = 1 hour
sliding interval = 5 mins

In that case, you'd be creating an output every 5 mins, aggregating data that you were collecting
every 30 seconds over a previous 1 hour period of time...

could you set the batch window to 5 mins?  Possibly, depending on the data source, but perhaps
you are already using that source on a more frequent basis elsewhere....or maybe you only
have a 1 min buffer on the source data....lots of possibilities, which is why there is the
flexibility and no hard/fast rule....

If you were trying to create continuously streaming output as fast as possible, then you would
probably (almost always) be setting your sliding interval = batch window and then shrinking
the batch window as short as possible.

More documentation here:

From: Mich Talebzadeh []
Sent: Thursday, May 05, 2016 4:26 AM
To: user
Subject: Re: Spark Streaming, Batch interval, Windows length and Sliding Interval settings

Any ideas/experience on this?

Dr Mich Talebzadeh


On 4 May 2016 at 21:45, Mich Talebzadeh <<>>

Just wanted opinions on this.

In Spark streaming the parameter

val ssc = new StreamingContext(sparkConf, Seconds(n))

defines the batch or sample interval for the incoming streams

In addition there is windows Length

// window length - The duration of the window below that must be multiple of batch interval
n in = > StreamingContext(sparkConf, Seconds(n))

val windowLength = L

And fibally the sliding interval
// sliding interval - The interval at which the window operation is performed

val slidingInterval = I

OK so as given the windowLength  L = multiples of n and the slidingInterval has to be consistent
to ensure that we can the head and tail of the window.

So as a heuristic approach for a batch interval of say 10 seconds, I put the windows length
at 3 times  that = 30 seconds and make the slidinginterval = batch interval = 10.

Obviously these are subjective depending on what is being measured. However, I believe having
slidinginterval = batch interval makes sense?

Appreciate any views on this.


Dr Mich Talebzadeh


information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately.  Thank you.
View raw message