hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mori Bellamy <mbell...@apple.com>
Subject Re: help with hadoop program
Date Thu, 10 Jul 2008 00:05:06 GMT
It seems like this problem could be done with one map-reduce task.
 From your input, map out (ID,{type,TimeStamp})

in your reduce, you can figure out how many A1's appear close to  
eachother. one naive approach is to iterate through all of the sets  
and collect them in some collection class. Then, if your custom set  
class implements Comparable, you can just call  
Collections.sort(myList). i'm sure there are faster solutions (perhaps  
you could sort them as you iterate through by hashing based on  
timestamp?)

does this answer your question?

On Jul 9, 2008, at 4:59 PM, Elia Mazzawi wrote:

> can someone point me to an example i can learn from.
>
> I have a data set that looks like this:
>
> ID    type   Timestamp
>
> A1    X   1215647404
> A2    X   1215647405
> A3    X   1215647406
> A1   Y   1215647409
>
> I want to count how many A1 Y, show up within 5 seconds of an A1 X
>
> I've written a few hadoop programs already but they were based on the
> wordcount example. and so only work with 1 line at a time.
> This problem requires looking back or remembering state? or more than
> one pass?
> I was thinking that it is possible to sort the data by ID, timestamp.
> then in that case the program only needs to look back a few lines at  
> a time?
>
> seems like a common problem so i thought I'd ask if there was an  
> example
> that is close to that or if someone has written something already.
>
> P.S. Hadoop Rocks!


Mime
View raw message