hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C G <parallel...@yahoo.com>
Subject Questions regarding configuration parameters...
Date Wed, 20 Feb 2008 17:30:58 GMT
Hi All:
  The documentation for the configuration parameters mapred.map.tasks and mapred.reduce.tasks
discuss these  values in terms of “number of available hosts” in the grid.  This description
strikes me as a bit odd given that a “host” could be anything from a uniprocessor to an
N-way box, where values for N could vary from 2..16 or more.  The documentation is also vague
about computing the actual value.  For example, for mapred.map.tasks the doc says “…a prime
number several times greater…”.  I’m curious about how people are interpreting the descriptions
and what values people are using.  Specifically, I’m wondering if I should be using “core
count” instead of “host count” to set these values.
  In the specific case of my system, we have 24 hosts where each host is a 4-way system (i.e.
96 cores total).  For mapred.map.tasks I chose the value 173, as that is a prime number which
is near 7*24.  For mapred.reduce.tasks I chose 23 since that is a prime number close to 24.
 Is this what was intended?
  Beyond curiousity, I’m concerned about setting these values and other configuration parameters
correctly because I am pursuing some performance issues where it is taking a very long time
to process small amounts of data.  I am hoping that some amount of tuning will resolve the
  Any thoughts and insights most appreciated.
  C G

Never miss a thing.   Make Yahoo your homepage.
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message