hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter De Witte <drdwi...@gmail.com>
Subject Re: Combine mapreduce result files
Date Sun, 28 Dec 2014 22:33:16 GMT
The number of part files depends on the number of reduce tasks which can be
tuned. So if you have a small problem you run with only one reduce task, if
it is a big problem you can run a second job with map and reduce operators
only emitten the input key value pairs and set the number of reduce tasks
to the number of files you'd like to have.

Regards, D

2014-12-28 13:50 GMT+01:00 tai khuu <khuutantai@gmail.com>:

> Hi, I would like to combine mapreduce part files into a single file, is
> there any good solution for this? currently I'm going through the file list
> and combine them 1 by 1 in 1 thread but I have some concerns about the
> performance. I think if data volume is big enough my current solution will
> yield very bad performance.

View raw message