hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Shine <Dave.Sh...@channelintelligence.com>
Subject dfs.name.dir and fs.checkpoint.dir
Date Wed, 04 Jan 2012 14:28:40 GMT
Per recommendations I received in Cloudera's Hadoop Administrator training, I configured our
dfs.name.dir property with 3 directories, one on the NN, one on an nfs mount to a Hadoop client
machine (in the same rack as the NN), and one to an nfs mount to a NAS (different rack, same
datacenter).  I also configured the fs.checkpoint.dir with 3 directories, one on the 2NN (the
NN is one 1 machine, the JT and 2NN are on a second machine), one on an nfs mount to the same
Hadoop client machine, and one to an nfs mount to the same NAS.

With this configuration we experienced sever delays in the delivery of updated fsimage files
from the 2NN to the NN (several hours for an fsimage file under 2GB).  I've since removed
the NAS from the fs.checkpoint.dir property and our network guys "optimized the hell out of
the nfs mount" and the updated fsimage file now get delivered to the NN in minutes.

My question is, is there really any reason at all for specifying more than one directory in
fs.checkpoint.dir?  I probably did it out of paranoia when I was first configuring the cluster.
 How is this property configured in other Hadoop environments?

Dave Shine

The information contained in this email message is considered confidential and proprietary
to the sender and is intended solely for review and use by the named recipient. Any unauthorized
review, use or distribution is strictly prohibited. If you have received this message in error,
please advise the sender by reply email and delete the message.

View raw message