hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Loddengaard" <a...@cloudera.com>
Subject Re: Optimized way
Date Thu, 04 Dec 2008 17:52:43 GMT
Well, Map/Reduce and Hadoop by definition run maps in parallel.  I think
you're interested in the following two configuration settings:


These go in hadoop-site.xml and will set the number of map and reduce tasks
for each tasktracker (node).  Learn more here:


Map tasks + reduce tasks should be slightly above the number of cores you
have per node.  So if you have 8 cores per node, setting map tasks to 6 and
reduce tasks to 4 would probably be good.

Hope this helps.


On Thu, Dec 4, 2008 at 6:42 AM, Aayush Garg <aayush.garg@gmail.com> wrote:

> Hi,
> I am having a 5 node cluster for hadoop usage. All nodes are multi-core.
> I am running a shell command in Map function of my program and this shell
> command takes one file as an input. Many of such files are copied in the
> So in summary map function will run a command like ./run <file1>
> <outputfile1>
> Could you please suggest the optimized way to do this..like if I can use
> multi core processing of nodes and many of such maps in parallel.
> Thanks,
> Aayush

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message