hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: is it possible to concatenate output files under many reducers?
Date Fri, 13 May 2011 01:57:53 GMT
You can control the number of reducers by calling
job.setNumReduceTasks() before you launch it.

-Joey

On Thu, May 12, 2011 at 6:33 PM, Jun Young Kim <juneng603@gmail.com> wrote:
> yes. that is a general solution to control counts of output files.
>
> however, if you need to control counts of outputs dynamically, how could you
> do?
>
> if an output file name is 'A', counts of this output files are needed to be
> 5.
> if an output file name is 'B', counts of this output files are needed to be
> 10.
>
> is it able to be under hadoop?
>
> Junyoung Kim (juneng603@gmail.com)
>
>
> On 05/12/2011 02:17 PM, Harsh J wrote:
>>
>> Short, blind answer: You could run 10 reducers.
>>
>> Otherwise, you'll have to run another job that picks up a few files
>> each in mapper and merges them out. But having 60 files shouldn't
>> really be a problem if they are sufficiently large (at least 80% of a
>> block size perhaps -- you can tune # of reducers to achieve this).
>>
>> On Thu, May 12, 2011 at 6:14 AM, Jun Young Kim<juneng603@gmail.com>
>>  wrote:
>>>
>>> hi, all.
>>>
>>> I have 60 reducers which are generating same output files.
>>>
>>> from output-r--00001 to output-r-00059.
>>>
>>> under this situation, I want to control the count of output files.
>>>
>>> for example, is it possible to concatenate all output files to 10 ?
>>>
>>> from output-r-00001 to output-r-00010.
>>>
>>> thanks
>>>
>>> --
>>> Junyoung Kim (juneng603@gmail.com)
>>>
>>>
>>
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message