hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengyun RAO <raofeng...@gmail.com>
Subject Re: Map-Reduce: How to make MR output one file an hour?
Date Sun, 02 Mar 2014 03:37:22 GMT
Thanks Devin. We don't just want one file. It's complicated.

if the input folder contains data in X hours, we want X files,
if Y hours, we want Y files.

obviously, X or Y is unknown on compile time.

2014-03-01 20:48 GMT+08:00 Devin Suiter RDX <dsuiter@rdx.com>:

> If you only want one file, then you need to set the number of reducers to
> 1.
> If the size of the data makes the original MR job impractical to use a
> single reducer, you run a second job on the output of the first, with the
> default mapper and reducer, which are the Identity- ones, and set that
> numReducers = 1.
> Or use hdfs getmerge function to collate the results to one file.
> On Mar 1, 2014 4:59 AM, "Fengyun RAO" <raofengyun@gmail.com> wrote:
>> Thanks, but how to set reducer number to X? X is dependent on input
>> (run-time), which is unknown on job configuration (compile time).
>> 2014-03-01 17:44 GMT+08:00 AnilKumar B <akumarb2010@gmail.com>:
>>> Hi,
>>> Write the custom partitioner on <timestamp> and as you mentioned set
>>> #reducers to X.

View raw message