samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shekar Tippur <ctip...@gmail.com>
Subject Re: Samza and sliding window
Date Fri, 26 Jun 2015 18:39:06 GMT
Thanks Milinda.
Is this feature available on 0.8 version of Samza?

- Shekar

On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage <mpathira@umail.iu.edu>
wrote:

> Hi Shekar,
>
> You can use Samza's local storage (
>
> http://samza.apache.org/learn/documentation/0.9/container/state-management.html
> )
> to keep the window state and windowing (
> http://samza.apache.org/learn/documentation/0.9/container/windowing.html)
> capabilities to handle the window advancement. During advancement you can
> update the local cache (Redis in your case). AFAIK, Samza doesn't provide
> any helpers or utilities to handle window state maintenance. You have to
> implement it on top of local storage or if you don't won't fault tolerance
> you can keep the state in-memory too (as long as the state fit in memory).
>
> Thanks
> Milinda
>
> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctippur@gmail.com> wrote:
>
> > Yan,
> >
> >
> > *What do you mean by "a local cache"? Is it a db like MySQL, something
> > likeRocksDB, or even just in-memory?*
> >
> > Local cache as in Redis
> >
> >
> >
> > *When you say "another topic", is this the topic consumed by the same
> > Samzajob as your 5-minutes-job, or in a separate job? What is the
> > relationbetween the topic and the application name*
> >
> > We dont have a 5 min job. All we have now is a stream of events coming
> from
> > a bunch of applications. All these land on a raw kafka topic. The stream
> > data has application name. I want to create a job that takes incoming
> > stream and group it by application name and count the number of events we
> > get in a 5 min sliding window.
> >
> > - Shekar
> >
> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang724@gmail.com> wrote:
> >
> > > Hi Shekar,
> > >
> > > Need a little more clarification.
> > >
> > > What do you mean by "a local cache"? Is it a db like MySQL, something
> > like
> > > RocksDB, or even just in-memory?
> > >
> > > When you say "another topic", is this the topic consumed by the same
> > Samza
> > > job as your 5-minutes-job, or in a separate job? What is the relation
> > > between the topic and the application name?
> > >
> > > Thanks,
> > >
> > > Fang, Yan
> > > yanfang724@gmail.com
> > >
> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <ctippur@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > > My apologies if I have raised it earlier.
> > > > Here is the use case:
> > > > I have a stream that is partitioned based on application name. I want
> > to
> > > be
> > > > able to count hte number of events happening for that particular
> > > > application in the past 5 minutes (sliding window) and update either
> > > > another topic or a local cache.
> > > >
> > > > Is this possible via 0.9 version of Samza?
> > > > If not, what is the easiest way to achieve this?
> > > >
> > > > - Shekar
> > > >
> > >
> >
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message