hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: Cumulative value using mapreduce
Date Thu, 04 Oct 2012 19:02:06 GMT

I did the cumulative sum in the HIVE UDF, as one of the project for my employer.
1) You need to decide the grouping elements for your cumulative. For example, an account,
a department etc. In the mapper, combine these information as your omit key.2) If you don't
have any grouping requirement, you just want a cumulative sum for all your data, then send
all the data to one common key, so they will all go to the same reducer.3) When you calculate
the cumulative sum, does the output need to have a sorting order? If so, you need to do the
2nd sorting, so the data will be sorted as the order you want in the reducer.4) In the reducer,
just do the sum, omit every value per original record (Not per key).
I will suggest you do this in the UDF of HIVE, as it is much easy, if you can build a HIVE
schema on top of your data.

From: tdunning@maprtech.com
Date: Thu, 4 Oct 2012 18:52:09 +0100
Subject: Re: Cumulative value using mapreduce
To: user@hadoop.apache.org

Bertrand is almost right.
The only difference is that the original poster asked about cumulative sum.
This can be done in reducer exactly as Bertrand described except for two points that make
it different from word count:

a) you can't use a combiner
b) the output of the program is as large as the input so it will have different performance
characteristics than aggregation programs like wordcount.

Bertrand's key recommendation to go read a book is the most important advice.

On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux <dechouxb@gmail.com> wrote:

It sounds like a1) group information by account2) compute sum per account

If that not the case, you should precise a bit more about your context.

This computing looks like a small variant of wordcount. If you do not know how to do it, you
should read books about Hadoop MapReduce and/or online tutorial. Yahoo's is old but still
a nice read to begin with : http://developer.yahoo.com/hadoop/tutorial/


On Thu, Oct 4, 2012 at 3:58 PM, Sarath <sarathchandra.josyam@algofusiontech.com> wrote:


I have a file which has some financial transaction data. Each transaction will have amount
and a credit/debit indicator.

I want to write a mapreduce program which computes cumulative credit & debit amounts at
each record

and append these values to the record before dumping into the output file.

Is this possible? How can I achieve this? Where should i put the logic of computing the cumulative



Bertrand Dechoux

View raw message