hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengyun RAO <raofeng...@gmail.com>
Subject Map-Reduce: How to make MR output one file an hour?
Date Sat, 01 Mar 2014 07:44:51 GMT
It's a common web log analysis situation. The original weblog is saved
every hour on multiple servers.
Now we would like the parsed log results to be saved one file an hour. How
to make it?

In our MR job, the input is a directory with many files in many hours,
let's say 4X files in X hours.
if there are e.g. 10 Reducers, then all of the results would be partitioned
into 10 files, each of which contains results in every hour.
We would like the results to be save in X files, each of which contains
only one-hour result.
Since the input files could change, I can't even set the reducer number to
be exactly X in the program.

View raw message