samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milinda Pathirage <mpath...@umail.iu.edu>
Subject Re: Samza and sliding window
Date Mon, 29 Jun 2015 14:54:01 GMT
Hi Shekar,

You can use Kafka's partitioning capabilities to partition your stream
based on application. That will make sure events related to a application
will always ended up in same partition. With this you will have multiple
applications in same partition and each partition will be mapped to a
single Samza task instance (AFAIK, Samza job is devided into several task
instances based on number of partitions in your topic.). Then in your Samza
task implementation you should maintain windowing counts for each
application.

Thanks
Milinda

On Sun, Jun 28, 2015 at 8:48 PM, Shekar Tippur <ctippur@gmail.com> wrote:

> Milinda,
>
> I see that the document you mentioned addresses windowing but I also need
> to group by different applications.
>
> Application    Count
> ---------------    --------
> A                    100
> B                    40
> C                    69
> ....
>
> - Shekar
>
> On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctippur@gmail.com> wrote:
>
> > Never mind. I see it here:
> >
> > http://samza.apache.org/learn/documentation/0.8/container/windowing.html
> >
> > Thanks again Milinda.
> >
> > - Shekar
> >
> > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctippur@gmail.com>
> wrote:
> >
> >> Thanks Milinda.
> >> Is this feature available on 0.8 version of Samza?
> >>
> >> - Shekar
> >>
> >> On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage <
> >> mpathira@umail.iu.edu> wrote:
> >>
> >>> Hi Shekar,
> >>>
> >>> You can use Samza's local storage (
> >>>
> >>>
> http://samza.apache.org/learn/documentation/0.9/container/state-management.html
> >>> )
> >>> to keep the window state and windowing (
> >>>
> http://samza.apache.org/learn/documentation/0.9/container/windowing.html
> >>> )
> >>> capabilities to handle the window advancement. During advancement you
> can
> >>> update the local cache (Redis in your case). AFAIK, Samza doesn't
> provide
> >>> any helpers or utilities to handle window state maintenance. You have
> to
> >>> implement it on top of local storage or if you don't won't fault
> >>> tolerance
> >>> you can keep the state in-memory too (as long as the state fit in
> >>> memory).
> >>>
> >>> Thanks
> >>> Milinda
> >>>
> >>> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctippur@gmail.com>
> >>> wrote:
> >>>
> >>> > Yan,
> >>> >
> >>> >
> >>> > *What do you mean by "a local cache"? Is it a db like MySQL,
> something
> >>> > likeRocksDB, or even just in-memory?*
> >>> >
> >>> > Local cache as in Redis
> >>> >
> >>> >
> >>> >
> >>> > *When you say "another topic", is this the topic consumed by the same
> >>> > Samzajob as your 5-minutes-job, or in a separate job? What is the
> >>> > relationbetween the topic and the application name*
> >>> >
> >>> > We dont have a 5 min job. All we have now is a stream of events
> coming
> >>> from
> >>> > a bunch of applications. All these land on a raw kafka topic. The
> >>> stream
> >>> > data has application name. I want to create a job that takes incoming
> >>> > stream and group it by application name and count the number of
> events
> >>> we
> >>> > get in a 5 min sliding window.
> >>> >
> >>> > - Shekar
> >>> >
> >>> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang724@gmail.com>
> >>> wrote:
> >>> >
> >>> > > Hi Shekar,
> >>> > >
> >>> > > Need a little more clarification.
> >>> > >
> >>> > > What do you mean by "a local cache"? Is it a db like MySQL,
> something
> >>> > like
> >>> > > RocksDB, or even just in-memory?
> >>> > >
> >>> > > When you say "another topic", is this the topic consumed by the
> same
> >>> > Samza
> >>> > > job as your 5-minutes-job, or in a separate job? What is the
> relation
> >>> > > between the topic and the application name?
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Fang, Yan
> >>> > > yanfang724@gmail.com
> >>> > >
> >>> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <ctippur@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > > > Hello,
> >>> > > > My apologies if I have raised it earlier.
> >>> > > > Here is the use case:
> >>> > > > I have a stream that is partitioned based on application
name. I
> >>> want
> >>> > to
> >>> > > be
> >>> > > > able to count hte number of events happening for that particular
> >>> > > > application in the past 5 minutes (sliding window) and update
> >>> either
> >>> > > > another topic or a local cache.
> >>> > > >
> >>> > > > Is this possible via 0.9 version of Samza?
> >>> > > > If not, what is the easiest way to achieve this?
> >>> > > >
> >>> > > > - Shekar
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Milinda Pathirage
> >>>
> >>> PhD Student | Research Assistant
> >>> School of Informatics and Computing | Data to Insight Center
> >>> Indiana University
> >>>
> >>> twitter: milindalakmal
> >>> skype: milinda.pathirage
> >>> blog: http://milinda.pathirage.org
> >>>
> >>
> >>
> >
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message