Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A910DCEA for ; Thu, 4 Oct 2012 19:02:39 +0000 (UTC) Received: (qmail 3318 invoked by uid 500); 4 Oct 2012 19:02:34 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 3226 invoked by uid 500); 4 Oct 2012 19:02:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 3218 invoked by uid 99); 4 Oct 2012 19:02:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Oct 2012 19:02:34 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of java8964@hotmail.com designates 65.55.111.106 as permitted sender) Received: from [65.55.111.106] (HELO blu0-omc2-s31.blu0.hotmail.com) (65.55.111.106) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Oct 2012 19:02:27 +0000 Received: from BLU162-W58 ([65.55.111.72]) by blu0-omc2-s31.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 4 Oct 2012 12:02:06 -0700 Message-ID: Content-Type: multipart/alternative; boundary="_76a18eb6-ebfd-4f03-98fd-5c6383000051_" X-Originating-IP: [192.100.104.17] From: java8964 java8964 To: Subject: RE: Cumulative value using mapreduce Date: Thu, 4 Oct 2012 15:02:06 -0400 Importance: Normal In-Reply-To: References: <506D9618.3050100@algofusiontech.com> , MIME-Version: 1.0 X-OriginalArrivalTime: 04 Oct 2012 19:02:06.0678 (UTC) FILETIME=[BF7D7F60:01CDA262] X-Virus-Checked: Checked by ClamAV on apache.org --_76a18eb6-ebfd-4f03-98fd-5c6383000051_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I did the cumulative sum in the HIVE UDF=2C as one of the project for my em= ployer. 1) You need to decide the grouping elements for your cumulative. For exampl= e=2C an account=2C a department etc. In the mapper=2C combine these informa= tion as your omit key.2) If you don't have any grouping requirement=2C you = just want a cumulative sum for all your data=2C then send all the data to o= ne common key=2C so they will all go to the same reducer.3) When you calcul= ate the cumulative sum=2C does the output need to have a sorting order? If = so=2C you need to do the 2nd sorting=2C so the data will be sorted as the o= rder you want in the reducer.4) In the reducer=2C just do the sum=2C omit e= very value per original record (Not per key). I will suggest you do this in the UDF of HIVE=2C as it is much easy=2C if y= ou can build a HIVE schema on top of your data. Yong From: tdunning@maprtech.com Date: Thu=2C 4 Oct 2012 18:52:09 +0100 Subject: Re: Cumulative value using mapreduce To: user@hadoop.apache.org Bertrand is almost right. The only difference is that the original poster asked about cumulative sum. This can be done in reducer exactly as Bertrand described except for two po= ints that make it different from word count: a) you can't use a combiner b) the output of the program is as large as the input so it will have diffe= rent performance characteristics than aggregation programs like wordcount. Bertrand's key recommendation to go read a book is the most important advic= e. On Thu=2C Oct 4=2C 2012 at 5:20 PM=2C Bertrand Dechoux = wrote: Hi=2C It sounds like a1) group information by account2) compute sum per account If that not the case=2C you should precise a bit more about your context. This computing looks like a small variant of wordcount. If you do not know = how to do it=2C you should read books about Hadoop MapReduce and/or online = tutorial. Yahoo's is old but still a nice read to begin with : http://devel= oper.yahoo.com/hadoop/tutorial/ Regards=2C Bertrand On Thu=2C Oct 4=2C 2012 at 3:58 PM=2C Sarath wrote: Hi=2C I have a file which has some financial transaction data. Each transaction w= ill have amount and a credit/debit indicator. I want to write a mapreduce program which computes cumulative credit & debi= t amounts at each record and append these values to the record before dumping into the output file. Is this possible? How can I achieve this? Where should i put the logic of c= omputing the cumulative values? Regards=2C Sarath. --=20 Bertrand Dechoux = --_76a18eb6-ebfd-4f03-98fd-5c6383000051_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I did the cumulative sum in the HIVE UDF=2C as one of the project for my em= ployer.

1) You need to decide the grouping elements for = your cumulative. For example=2C an account=2C a department etc. In the mapp= er=2C combine these information as your omit key.
2) If you don't= have any grouping requirement=2C you just want a cumulative sum for all yo= ur data=2C then send all the data to one common key=2C so they will all go = to the same reducer.
3) When you calculate the cumulative sum=2C = does the output need to have a sorting order? If so=2C you need to do the 2= nd sorting=2C so the data will be sorted as the order you want in the reduc= er.
4) In the reducer=2C just do the sum=2C omit every value per = original record (Not per key).

I will suggest you = do this in the UDF of HIVE=2C as it is much easy=2C if you can build a HIVE= schema on top of your data.

Yong


From: tdunning@ma= prtech.com
Date: Thu=2C 4 Oct 2012 18:52:09 +0100
Subject: Re: Cumula= tive value using mapreduce
To: user@hadoop.apache.org

Bertrand is= almost right.

The only difference is that the original = poster asked about cumulative sum.

This can be don= e in reducer exactly as Bertrand described except for two points that make = it different from word count:

a) you can't use a combiner

b)= the output of the program is as large as the input so it will have differe= nt performance characteristics than aggregation programs like wordcount.

Bertrand's key recommendation to go read a book is the = most important advice.

On Thu=2C Oct 4= =2C 2012 at 5:20 PM=2C Bertrand Dechoux <=3Bdechouxb@gmail.com>=3B wrote:
Hi=2C

It sounds like a
1) gr= oup information by account
2) compute sum per account

If that not the case=2C you should precise a bit more a= bout your context.

This computing looks like a small variant of wordcount. If y= ou do not know how to do it=2C you should read books about Hadoop MapReduce= and/or online tutorial. Yahoo's is old but still a nice read to begin with= : =3Bhttp://developer.yahoo.com/hadoop/tutorial/

Regards=2C

Bertrand


On Thu=2C Oct 4=2C 2012 a= t 3:58 PM=2C Sarath <=3Bsarathchandra.josyam@algofusiontech.com>= =3B wrote:
Hi=2C

I have a file which has some financial transaction data. Each transaction w= ill have amount and a credit/debit indicator.
I want to write a mapreduce program which computes cumulative credit &= =3B debit amounts at each record
and append these values to the record before dumping into the output file.<= br>
Is this possible? How can I achieve this? Where should i put the logic of c= omputing the cumulative values?

Regards=2C
Sarath.



--
Bertrand Dechoux

= --_76a18eb6-ebfd-4f03-98fd-5c6383000051_--