hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Best practices for jobs with large Map output
Date Thu, 14 Apr 2011 18:50:08 GMT
Hello Shai,

On Fri, Apr 15, 2011 at 12:01 AM, Shai Erera <serera@gmail.com> wrote:
> Hi
> I'm running on Hadoop 0.20.2 and I have a job with the following nature:
> * Mapper outputs very large records (50 to 200 MB)
> * Reducer (single) merges all those records together
> * Map output key is a constant (could be a NullWritable, but currently it's
> a LongWritable(1))
> * Reducer doesn't care about the keys at all

If I understand right, your single reducer's only work is to merge
your multiple map's large record emits, and nothing else (It does not
have 'keys' to worry about), correct?

Why not do this with a normal FS-using program that opens a single
file to write out map-materialized output files from a Map-only job to
merge them?

Harsh J

View raw message