hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Bohr <a...@gradientx.com>
Subject Best Practices: mapred.job.tracker.handler.count, dfs.namenode.handler.count
Date Tue, 05 Mar 2013 00:33:28 GMT
Hi,
I'm looking for some feedback on how to decide how many threads to assign
to the Namenode and Jobtracker?

I currently have 24 data nodes (running CDH3) and am finding a lot varying
advice on how to set these properties and change them as the cluster grows.

Some (older) documentation (*
http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/
, http://hadoop.apache.org/docs/r1.0.4/mapred-default.html* ) has it in the
range of the default 10 for a smallish cluster.
And the O'reilly *Hadoop Opertaions *book puts it a good deal higher and
gives a handy precise formula of: natural log of # of nodes X 20 , or: python
-c 'import math ; print int(math.log(24) * 20)'
Which = 63 for 24 nodes.

Does anyone have strong opinions on how to set these variables?  Does
anyone else use the natural log X 20?
Any other factors beyond # of nodes that should be factored?  I'm assuming
memory available on the NameNode/Jobtracker plays a big part, but right now
I have a good amount unused memory so I'm ok going with a higher #.
My jobtracker is occasionally freezing so this is one of the configs I
think might be causing problems.

And second, less important, part of the question, is there any need to put
these properties in their respective config files (mapred-site.xml,
hdfs-site.xml) on any node other than the Namenode?
I've looked but have never found any good documentation discussing which
properties need to be on which machine, and I'd prefer to keep properties
off of a machine if they don't need to be there (so I don't need to restart
anything if the property changes, and keep environments simpler).

Thanks

Mime
View raw message