hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: restoring state via map reduce
Date Mon, 20 Aug 2007 23:30:37 GMT


This is nicely handled using pig.  If you want to use raw map-reduce, then
writing the pig program first is instructive.

  x = group input by ip
  foreach x generate flatten(computeState(*))

What is happening here is that in the first line, your input is being mapped
with ip address being extracted as the reduce key.  The foreach defines
reduce functions which call your function computeState which should sort its
input by time and emit one record per time step with the elaborated state of
the date.  All that remains is for you to order your output by time.

On 8/20/07 1:54 PM, "Torsten Curdt" <tcurdt@apache.org> wrote:

> I am wondering what the most efficient way would be handle the
> following scenario with map reduce in hadoop. Let's say we have the
> following data
> 
>    time=1, ip=1, a=1
>    time=2, ip=2, a=2
>    time=3, ip=2, b=4
>    time=2, ip=1, b=2
>    time=4, ip=1, a=4
>    time=5, ip=2, a=7
>    time=6, ip=1, c=9
>    time=7, ip=2, c=11
> 
> Which basically represent a timestamp and requests from different IPs
> providing certain values. Better readable like this:
> 
>    time=1, ip=1, a=1
>    time=2, ip=1, b=2
>    time=4, ip=1, a=4
>    time=6, ip=1, c=9
> 
>    time=2, ip=2, a=2
>    time=3, ip=2, b=4
>    time=5, ip=2, a=7
>    time=7, ip=2, c=11
> 
> I now would like to re-create the state in time of all the different
> values:
> 
>    time=1, ip=1, a=1, [b=0, c=0]
>    time=2, ip=2, a=2, [b=0, c=0]
>    time=3, ip=2, a=2, b=4, [c=0]
>    time=2, ip=1, a=1, b=2, [c=0]
>    time=4, ip=1, a=4, b=2, [c=0]
>    time=5, ip=2, a=7, b=4, [c=0]
>    time=6, ip=1, a=4, b=2, c=9
>    time=7, ip=2, a=7, b=4, c=11
> 
> [] = implicit default value
> 
> Or for better reading:
> 
>    time=1, ip=1, a=1, b=0, c=0
>    time=2, ip=1, a=1, b=2, c=0
>    time=4, ip=1, a=4, b=2, c=0
>    time=6, ip=1, a=4, b=2, c=9
> 
>    time=2, ip=2, a=2, b=0, c=0
>    time=3, ip=2, a=2, b=4, c=0
>    time=5, ip=2, a=7, b=4, c=0
>    time=7, ip=2, a=7, b=4, c=11
> 
> So my fellow map-reduce writers ..how would one tackle this best?
> Suggestions?
> 
> cheers
> --
> Torsten


Mime
View raw message