hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corneliu-Tudor Vlad <corneliutudor.v...@ens-lyon.fr>
Subject Problem with Hadoop Streaming and -D mapred.tasktracker.map.tasks.maximum option
Date Mon, 10 May 2010 14:07:24 GMT


I am a new user of Hadoop and I have some trouble using Hadoop  
Streaming and the "-D mapred.tasktracker.map.tasks.maximum" option.

I'm experimenting with an unmanaged application (C++) which I want to  
run over several nodes in 2 scenarious
1) the number of maps (input splits) is equal to the number of nodes
2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...

Initially, when running the tests in scenario 1 I would sometimes get  
2 process/node on half the nodes. However I fixed this by adding the  
directive -D mapred.tasktracker.map.tasks.maximum=1, so everything  
works fine.

In the case of scenario 2 (more maps than nodes) this directive no  
longer works, always obtaining 2 processes/node. I tested the even  
with putting maximum=5 and I still get 2 processes/node.

The entire command I use is:

/usr/bin/time --format="-duration:\t%e |\t-MFaults:\t%F  
|\t-ContxtSwitch:\t%w" \
  /opt/hadoop/bin/hadoop jar  
/opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
  -D mapred.tasktracker.map.tasks.maximum=1 \
  -D mapred.map.tasks=30 \
  -D mapred.reduce.tasks=0 \
  -D io.file.buffer.size=5242880 \
  -libjars "/opt/hadoop/contrib/streaming/hadoop-7debug.jar" \
  -input input/test553short \
  -output out1 \
  -mapper "/opt/jobdata/script_1k" \
  -inputformat "me.MyInputFormat"

I'm using is Debian Lenny x64, and Hadoop 0.20.2.

My question is: why is this happening and how can I make it work  
properly (i.e. be able to limit exactly how many mappers I can have at  
1 time per node)

Thank you in advance,

View raw message