spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Sharma <ashishonl...@gmail.com>
Subject updateStateByKey and invFunction
Date Tue, 24 Feb 2015 08:06:14 GMT
So say I want to calculate top K users visiting a page in the past 2 hours
updated every 5 mins.

so here I want to maintain something like this

Page_01 => {user_01:32, user_02:3, user_03:7...}
...

Basically a count of number of times a user visited a page. Here my key is
page name/id and state is the hashmap.

Now in updateStateByKey I get the previous state and new events coming *in*
the window. Is there a way to also get the events going *out* of the
window? This was I can incrementally update the state over a rolling window.

What is the efficient way to do it in spark streaming?

Thanks
Ashish

Mime
View raw message