hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: Splitting output of MapReduce according to file size
Date Sat, 10 Nov 2007 20:10:53 GMT
On Sat, Nov 10, 2007 at 07:56:22PM +0000, Holger Stenzhorn wrote:
>For testing purposes I am running Hapoop in local mode.
>Is there a possibility to split the output (TextOutputFormat) of a 
>MapReduce job into several output files (e.g. "part-0000", "part-0001", 
>etc.) according to some maximal file size per file?

I'd say the easiest way is to do the splitting as a post-processing step after your job...

You could run your job with multiple reduces to get multiple files (each reduce has one output).
Depending on your Partitioner you can control how much data each reducer is input. (see org.apache.hadoop.mapred.Partitioner


>I.e. is there a setting such a file size that can be set in the 
>hadoop-site.xml for example?
>Even through reading the documentation and mailing list I did not find a 
>simple solution...  I really appreaciate your help!

View raw message