hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sarath <sarathchandra.jos...@algofusiontech.com>
Subject Re: Cumulative value using mapreduce
Date Fri, 19 Oct 2012 06:03:17 GMT
Hi Yong,

Could you share more details about the HIVE UDF you have written for 
this use case?
As suggested, I would like to try this approach and see if that 
simplifies the solution to my requirement.

~Sarath.


On Friday 05 October 2012 12:32 AM, java8964 java8964 wrote:
> I did the cumulative sum in the HIVE UDF, as one of the project for my 
> employer.
>
> 1) You need to decide the grouping elements for your cumulative. For 
> example, an account, a department etc. In the mapper, combine these 
> information as your omit key.
> 2) If you don't have any grouping requirement, you just want a 
> cumulative sum for all your data, then send all the data to one common 
> key, so they will all go to the same reducer.
> 3) When you calculate the cumulative sum, does the output need to have 
> a sorting order? If so, you need to do the 2nd sorting, so the data 
> will be sorted as the order you want in the reducer.
> 4) In the reducer, just do the sum, omit every value per original 
> record (Not per key).
>
> I will suggest you do this in the UDF of HIVE, as it is much easy, if 
> you can build a HIVE schema on top of your data.
>
> Yong
>
> ------------------------------------------------------------------------
> From: tdunning@maprtech.com
> Date: Thu, 4 Oct 2012 18:52:09 +0100
> Subject: Re: Cumulative value using mapreduce
> To: user@hadoop.apache.org
>
> Bertrand is almost right.
>
> The only difference is that the original poster asked about cumulative 
> sum.
>
> This can be done in reducer exactly as Bertrand described except for 
> two points that make it different from word count:
>
> a) you can't use a combiner
>
> b) the output of the program is as large as the input so it will have 
> different performance characteristics than aggregation programs like 
> wordcount.
>
> Bertrand's key recommendation to go read a book is the most important 
> advice.
>
> On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux <dechouxb@gmail.com 
> <mailto:dechouxb@gmail.com>> wrote:
>
>     Hi,
>
>     It sounds like a
>     1) group information by account
>     2) compute sum per account
>
>     If that not the case, you should precise a bit more about your
>     context.
>
>     This computing looks like a small variant of wordcount. If you do
>     not know how to do it, you should read books about Hadoop
>     MapReduce and/or online tutorial. Yahoo's is old but still a nice
>     read to begin with : http://developer.yahoo.com/hadoop/tutorial/
>
>     Regards,
>
>     Bertrand
>
>
>     On Thu, Oct 4, 2012 at 3:58 PM, Sarath
>     <sarathchandra.josyam@algofusiontech.com
>     <mailto:sarathchandra.josyam@algofusiontech.com>> wrote:
>
>         Hi,
>
>         I have a file which has some financial transaction data. Each
>         transaction will have amount and a credit/debit indicator.
>         I want to write a mapreduce program which computes cumulative
>         credit & debit amounts at each record
>         and append these values to the record before dumping into the
>         output file.
>
>         Is this possible? How can I achieve this? Where should i put
>         the logic of computing the cumulative values?
>
>         Regards,
>         Sarath.
>
>
>
>
>     -- 
>     Bertrand Dechoux
>
>

Mime
View raw message