hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tarandeep Singh" <tarand...@gmail.com>
Subject Optimal values of parameters in hadoop-site.xml
Date Tue, 23 Sep 2008 16:52:03 GMT

I am running a small cluster of 4 nodes, each node having quad-cores and 8
gb of RAM. I have used the following values for parameters in
hadoop-site.xml. I want to know, can I increase the performance further by
changing one or more of these-

dfs.replication: I have set it to 2. Will I get performance boost if I set
it to 4 (=number of nodes). If this is true, how much replication people use
when they run a cluster of say 1000 nodes. Do they replicate peta bytes of
data ?

mapred.child.java.opts: -Xms4096m -Xmx7500m- I tried with diff min and max
memory and found there was not any improvement in performance. I was
thinking that giving more memory to the process, will help it to do
sorting/shuffling etc quickly, but it seems my thinking is not correct. Can
anyone comment on this parameter and what should be the optimal value

fs.inmemory.size.mb: I have set it to 225. Increasing it further does not
help. Also can someone explain it in detail like how does this parameter
affects performance.

io.sort.mb: I have set it to 200. Increasing it further does not help, at
least in my jobs. Anyone with more details about this parameter ?

mapred.map.tasks: After reading the description, I set its value as 41
(nearest prime close to 10*number of nodes).
mapred.reduce.tasks: I set its value to 5 (nearest prime close to number of

However I noticed there was not much performance gain. If I use the default
values, I get similar performance. But I ran the test on a small amount of
data. I have not tested with huge data set. But I would like to know how
these parameters are going to affect performance.

I used the default values for-


Thanks a lot,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message