hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Multithreaded Reducer
Date Fri, 10 Apr 2009 18:54:05 GMT
At that level of parallelism, you're right that the process overhead would
be too high.
- Aaron


On Fri, Apr 10, 2009 at 11:36 AM, Sagar Naik <snaik@attributor.com> wrote:

>
> Two things
> - multi-threaded is preferred over multi-processes. The process I m
> planning is IO bound so I can really take advantage of  multi-threads (100
> threads)
> - Correct me if I m wrong. The next MR_JOB in the pipeline will have
>  increased number of splits to process as the number of reducer-outputs
> (from prev job) have increased . This leads to increase
>  in the map-task completion time.
>
>
>
> -Sagar
>
>
> Aaron Kimball wrote:
>
>> Rather than implementing a multi-threaded reducer, why not simply increase
>> the number of reducer tasks per machine via
>> mapred.tasktracker.reduce.tasks.maximum, and increase the total number of
>> reduce tasks per job via mapred.reduce.tasks to ensure that they're all
>> filled. This will effectively utilize a higher number of cores.
>>
>> - Aaron
>>
>> On Fri, Apr 10, 2009 at 11:12 AM, Sagar Naik <snaik@attributor.com>
>> wrote:
>>
>>
>>
>>> Hi,
>>> I would like to implement a Multi-threaded reducer.
>>> As per my understanding , the system does not have one coz we expect the
>>> output to be sorted.
>>>
>>> However, in my case I dont need the output sorted.
>>>
>>> Can u pl point to me any other issues or it would be safe to do so
>>>
>>> -Sagar
>>>
>>>
>>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message