hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sarath <sarathchandra.jos...@algofusiontech.com>
Subject Re: Cumulative value using mapreduce
Date Fri, 05 Oct 2012 04:56:43 GMT
Thanks for all your responses. As suggested will go through the 
documentation once again.

But just to clarify, this is not my first map-reduce program. I've 
already written a map-reduce for our product which does filtering and 
transformation of the financial data. This is a new requirement we've 
got. I have also did the logic of calculating the cumulative sums. But 
the output is not coming as desired and I feel I'm not doing it right 
way and missing something. So thought of taking a quick help from the 
mailing list.

As an example, say we have records as below -
Txn ID
	Txn Date
	Cr/Dr Indicator
	Amount
1001
	9/22/2012
	CR
	1000
1002
	9/25/2012
	DR
	500
1003
	10/1/2012
	DR
	1500
1004
	10/4/2012
	CR
	2000


When this file passed the logic should append the below 2 columns to the 
output for each record above -
CR Cumulative Amount
	DR Cumulative Amount
1000
	0
1000
	500
1000
	2000
3000
	2000


Hope the problem is clear now. Please provide your suggestions on the 
approach to the solution.

Regards,
Sarath.

On Friday 05 October 2012 02:51 AM, Bertrand Dechoux wrote:
> I indeed didn't catch the cumulative sum part. Then I guess it begs 
> for what-is-often-called-a-secondary-sort, if you want to compute 
> different cumulative sums during the same job. It can be more or less 
> easy to implement depending on which API/library/tool you are using. 
> Ted comments on performance are spot on.
>
> Regards
>
> Bertrand
>
> On Thu, Oct 4, 2012 at 9:02 PM, java8964 java8964 
> <java8964@hotmail.com <mailto:java8964@hotmail.com>> wrote:
>
>     I did the cumulative sum in the HIVE UDF, as one of the project
>     for my employer.
>
>     1) You need to decide the grouping elements for your cumulative.
>     For example, an account, a department etc. In the mapper, combine
>     these information as your omit key.
>     2) If you don't have any grouping requirement, you just want a
>     cumulative sum for all your data, then send all the data to one
>     common key, so they will all go to the same reducer.
>     3) When you calculate the cumulative sum, does the output need to
>     have a sorting order? If so, you need to do the 2nd sorting, so
>     the data will be sorted as the order you want in the reducer.
>     4) In the reducer, just do the sum, omit every value per original
>     record (Not per key).
>
>     I will suggest you do this in the UDF of HIVE, as it is much easy,
>     if you can build a HIVE schema on top of your data.
>
>     Yong
>
>     ------------------------------------------------------------------------
>     From: tdunning@maprtech.com <mailto:tdunning@maprtech.com>
>     Date: Thu, 4 Oct 2012 18:52:09 +0100
>     Subject: Re: Cumulative value using mapreduce
>     To: user@hadoop.apache.org <mailto:user@hadoop.apache.org>
>
>
>     Bertrand is almost right.
>
>     The only difference is that the original poster asked about
>     cumulative sum.
>
>     This can be done in reducer exactly as Bertrand described except
>     for two points that make it different from word count:
>
>     a) you can't use a combiner
>
>     b) the output of the program is as large as the input so it will
>     have different performance characteristics than aggregation
>     programs like wordcount.
>
>     Bertrand's key recommendation to go read a book is the most
>     important advice.
>
>     On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux
>     <dechouxb@gmail.com <mailto:dechouxb@gmail.com>> wrote:
>
>         Hi,
>
>         It sounds like a
>         1) group information by account
>         2) compute sum per account
>
>         If that not the case, you should precise a bit more about your
>         context.
>
>         This computing looks like a small variant of wordcount. If you
>         do not know how to do it, you should read books about Hadoop
>         MapReduce and/or online tutorial. Yahoo's is old but still a
>         nice read to begin with :
>         http://developer.yahoo.com/hadoop/tutorial/
>
>         Regards,
>
>         Bertrand
>
>
>         On Thu, Oct 4, 2012 at 3:58 PM, Sarath
>         <sarathchandra.josyam@algofusiontech.com
>         <mailto:sarathchandra.josyam@algofusiontech.com>> wrote:
>
>             Hi,
>
>             I have a file which has some financial transaction data.
>             Each transaction will have amount and a credit/debit
>             indicator.
>             I want to write a mapreduce program which computes
>             cumulative credit & debit amounts at each record
>             and append these values to the record before dumping into
>             the output file.
>
>             Is this possible? How can I achieve this? Where should i
>             put the logic of computing the cumulative values?
>
>             Regards,
>             Sarath.
>
>
>
>
>         -- 
>         Bertrand Dechoux
>
>
>
>
>
> -- 
> Bertrand Dechoux

Mime
View raw message