hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Dong <simond...@gmail.com>
Subject Re: Map-Reduce: How to make MR output one file an hour?
Date Sat, 01 Mar 2014 14:48:58 GMT
You can use MultipleOutputs and construct the custom file name based on
timestamp.

http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html


On Fri, Feb 28, 2014 at 11:44 PM, Fengyun RAO <raofengyun@gmail.com> wrote:

> It's a common web log analysis situation. The original weblog is saved
> every hour on multiple servers.
> Now we would like the parsed log results to be saved one file an hour. How
> to make it?
>
> In our MR job, the input is a directory with many files in many hours,
> let's say 4X files in X hours.
> if there are e.g. 10 Reducers, then all of the results would be
> partitioned into 10 files, each of which contains results in every hour.
> We would like the results to be save in X files, each of which contains
> only one-hour result.
> Since the input files could change, I can't even set the reducer number to
> be exactly X in the program.
>

Mime
View raw message