samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milinda Pathirage <mpath...@umail.iu.edu>
Subject Re: Samza and sliding window
Date Fri, 26 Jun 2015 18:24:29 GMT
Hi Shekar,

You can use Samza's local storage (
http://samza.apache.org/learn/documentation/0.9/container/state-management.html)
to keep the window state and windowing (
http://samza.apache.org/learn/documentation/0.9/container/windowing.html)
capabilities to handle the window advancement. During advancement you can
update the local cache (Redis in your case). AFAIK, Samza doesn't provide
any helpers or utilities to handle window state maintenance. You have to
implement it on top of local storage or if you don't won't fault tolerance
you can keep the state in-memory too (as long as the state fit in memory).

Thanks
Milinda

On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctippur@gmail.com> wrote:

> Yan,
>
>
> *What do you mean by "a local cache"? Is it a db like MySQL, something
> likeRocksDB, or even just in-memory?*
>
> Local cache as in Redis
>
>
>
> *When you say "another topic", is this the topic consumed by the same
> Samzajob as your 5-minutes-job, or in a separate job? What is the
> relationbetween the topic and the application name*
>
> We dont have a 5 min job. All we have now is a stream of events coming from
> a bunch of applications. All these land on a raw kafka topic. The stream
> data has application name. I want to create a job that takes incoming
> stream and group it by application name and count the number of events we
> get in a 5 min sliding window.
>
> - Shekar
>
> On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang724@gmail.com> wrote:
>
> > Hi Shekar,
> >
> > Need a little more clarification.
> >
> > What do you mean by "a local cache"? Is it a db like MySQL, something
> like
> > RocksDB, or even just in-memory?
> >
> > When you say "another topic", is this the topic consumed by the same
> Samza
> > job as your 5-minutes-job, or in a separate job? What is the relation
> > between the topic and the application name?
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang724@gmail.com
> >
> > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <ctippur@gmail.com>
> wrote:
> >
> > > Hello,
> > > My apologies if I have raised it earlier.
> > > Here is the use case:
> > > I have a stream that is partitioned based on application name. I want
> to
> > be
> > > able to count hte number of events happening for that particular
> > > application in the past 5 minutes (sliding window) and update either
> > > another topic or a local cache.
> > >
> > > Is this possible via 0.9 version of Samza?
> > > If not, what is the easiest way to achieve this?
> > >
> > > - Shekar
> > >
> >
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message