hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Parallelism of sorts
Date Mon, 05 May 2008 18:29:00 GMT
Brice Arnould wrote:
> I was asking myself if it could be a good idea to parallelize some of the
> alogorithms of Hadoop, such as MergeSorter, for the case a single job of
> run on a multicore system.

One can already exploit parallelism on a multicore system by using 
"pseudo-distributed" mode and increasing 
mapred.tasktracker.map.tasks.maximum and 
mapred.tasktracker.reduce.tasks.maximum.

LocalRunner should also someday be enhanced to run multiple maps and 
reduces in separate threads, which would be more efficient, since 
intermediate data would not need to travel through the loopback network 
interface.  But I don't see an urgent case for making the sort code 
itself multi-threaded, since MapReduce itself performs parallel sorting.

Doug

Mime
View raw message