hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ritch <david.ri...@gmail.com>
Subject System Layout Best Practices
Date Thu, 05 Mar 2009 16:20:01 GMT
Are there any published guidelines on system configuration for Hadoop?

I've seen hardware suggestions, but I'm really interested in recommendations
on disk layout and partitioning.  The defaults, as shipped and defined in
hadoop-default.xml, may be appropriate for testing, but are not really
appropriate for sustained use.  For example, data and metadata are both
stored in /tmp.  In typical use on a cluster with a couple hundred nodes,
the NameNode can generate 3-5GB of logs per day.  If you configure your
namenode host badly, it's easy to fill up the partition used by dfs for
metadata, and clobber your dfs filesystem.  I would think that thresholding
logs on WARN would be preferable to INFO.

On a datanode, we would like to reserve as much space as we can for data,
but we know that map-reduce jobs need some local storage.  How do people
generally estimate the amount of space required for temporary storage?  I
would assume that it would be good to partition it from data storage, to
prevent running out of temp space on some nodes.  I would also think that it
would be preferable for performance to have temp space on a different
spindle, so it and hdfs data can be accessed independently.

I would be interested to know how other sites configure their systems, and I
would love to see some guidelines for system configuration for Hadoop.

Thank you!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message