hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Suiter RDX <dsui...@rdx.com>
Subject Re: Map-Reduce: How to make MR output one file an hour?
Date Sat, 01 Mar 2014 12:48:12 GMT
If you only want one file, then you need to set the number of reducers to 1.

If the size of the data makes the original MR job impractical to use a
single reducer, you run a second job on the output of the first, with the
default mapper and reducer, which are the Identity- ones, and set that
numReducers = 1.

Or use hdfs getmerge function to collate the results to one file.
On Mar 1, 2014 4:59 AM, "Fengyun RAO" <raofengyun@gmail.com> wrote:

> Thanks, but how to set reducer number to X? X is dependent on input
> (run-time), which is unknown on job configuration (compile time).
> 2014-03-01 17:44 GMT+08:00 AnilKumar B <akumarb2010@gmail.com>:
>> Hi,
>> Write the custom partitioner on <timestamp> and as you mentioned set
>> #reducers to X.

View raw message