hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tarandeep Singh" <tarand...@gmail.com>
Subject Question - Sum of a col in log file - memory requirement in reducer
Date Tue, 27 May 2008 21:53:53 GMT

Is it correct that an intermediate key from a mapper goes to only 1 reducer ?
If yes, then if I have to sum up values of some col in a log file, a
reducer will consume a lot of memory -

I have a simple requirement - to sum up the values of one of the
column in the log files.
Suppose the log file has structure -

C1 C2 C3

and I want sum of all C2 cols..

I thought of this Map Reduce solution -

Map- Key = 1
        Value = C2

At Reduce I will get - ( 1, [values of C2 col] )

Now my question is if log file has billion of records, so this would
mean effectively one reducer will receive a list of billion values.
How is reducer going to handle such huge data ?
If my approach is wrong, please suggest an alternative.


View raw message