hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tudor Vlad (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1781) option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node
Date Mon, 10 May 2010 19:18:30 GMT
option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is bigger
than no of nodes - always spawns 2 mapers/node
--------------------------------------------------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-1781
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.20.2
         Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM
            Reporter: Tudor Vlad


Hello

I am a new user of Hadoop and I have some trouble using Hadoop Streaming and the "-D mapred.tasktracker.map.tasks.maximum"
option. 

I'm experimenting with an unmanaged application (C++) which I want to run over several nodes
in 2 scenarios
1) the number of maps (input splits) is equal to the number of nodes
2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...

Initially, when running the tests in scenario 1 I would sometimes get 2 process/node on half
the nodes. However I fixed this by adding the optin "-D mapred.tasktracker.map.tasks.maximum=1",
so everything works fine.

In the case of scenario 2 (more maps than nodes) this directive no longer works, always obtaining
2 processes/node. I tested the even with putting maximum=5 and I still get 2 processes/node.

The entire command I use is:

/usr/bin/time --format="-duration:\t%e |\t-MFaults:\t%F |\t-ContxtSwitch:\t%w" \
 /opt/hadoop/bin/hadoop jar /opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
 -D mapred.tasktracker.map.tasks.maximum=1 \
 -D mapred.map.tasks=30 \
 -D mapred.reduce.tasks=0 \
 -D io.file.buffer.size=5242880 \
 -libjars "/opt/hadoop/contrib/streaming/hadoop-7debug.jar" \
 -input input/test \
 -output out1 \
 -mapper "/opt/jobdata/script_1k" \
 -inputformat "me.MyInputFormat"

Why is this happening and how can I make it work properly (i.e. be able to limit exactly how
many mappers I can have at 1 time per node)?

Thank you in advance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message