hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: aggregation by time window
Date Mon, 28 Jan 2013 13:17:14 GMT
Quick idea:

since each of your events will go into several buckets, you could use map() to emit each item
multiple times for each bucket.

Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets <oruchovets@gmail.com>:

> Hi ,
>    I have such row data structure:
> 
> event_id  |   time
> ==============
> event1     |  10:07
> event2     |  10:10
> event3     |  10:12
> 
> event4     |   10:20
> event5     |   10:23
> event6     |   10:25

map(event1,10:07) would emit (10:04,event1), (10:05,event1), ..., (10:10,event1) and so on.

In reduce(), all your desired events would meet for the same minute.

Kai

-- 
Kai Voigt
k@123.org





Mime
View raw message