hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jeremy p <athomewithagroove...@gmail.com>
Subject How to do aggregate operations with multiple reducers
Date Fri, 29 Nov 2013 22:37:19 GMT
Hey all,

So, I'm writing a module where I need to do aggregate operations over the
entire set of data.  However, I also want to use multiple reducers.

For example, let's say each row of input data looks like this :


Let's say my mapper outputs DATE_TIME as a key, and the temperature as a
value.  In this example, I'm using 3 reducers, which should create three
output files. I want to find the day with the highest temperature in the
entire data set.

I know I could just write a script that examines the output from the
reducers and picks out the value with the highest temperature.  I could
also write a mapreduce job that does the same thing, and chain the two jobs
together.  However, these solutions seem kinda wrong to me.

What's the commonly-accepted best way to do this?


View raw message