hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: aggregation by time window
Date Mon, 28 Jan 2013 13:51:41 GMT
Hi , Zhiwei.
    No :-). Every 7 minutes is is easy. just transform time to
milisecond/7*60000 will give you a bucket key.

I need to do the following:
    Find the events which was dirung time T related to the event X.

In very naive approach I need to take first event and find other events
which happend during 7 minutes from first event time. But I think it will
be very slow and I am looking for a way to improve this naive approach.

Thanks
Oleg.



On Mon, Jan 28, 2013 at 3:09 PM, Zhiwei Lin <zhiwei.uk@gmail.com> wrote:

> do you mean every 7 mins?
> e.g, [10:07, 10:14),
>        [10:14, 10:21) .....
>
> On 28 January 2013 12:56, Oleg Ruchovets <oruchovets@gmail.com> wrote:
>
> > Hi ,
> >     I have such row data structure:
> >
> > event_id  |   time
> > ==============
> > event1     |  10:07
> > event2     |  10:10
> > event3     |  10:12
> >
> > event4     |   10:20
> > event5     |   10:23
> > event6     |   10:25
> >
> > Numbers of records is  50-100 million.
> >
> > Question:
> >    I need to get events that was during time T.
> >
> > For example: if T=7 munutes.
> >      event1 , event2 , event3 were detected durint 7 minutes.
> >      event4 , event5 , event6 were detected during 7 minutes.
> >
> > How can I implement such aggregation using map/reduce.
> >
> > Thanks
> > Oleg.
> >
>
>
>
> --
>
> Best wishes.
>
> Zhiwei
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message