hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis <hcoy...@ghostar.org>
Subject Re: ext4 on a hadoop cluster datanodes
Date Mon, 06 Oct 2014 22:09:30 GMT
For filesystem creation, we use the following with mkfs.ext4

mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L $HDFS_LABEL

By default, mkfs creates way too many inodes, so we tune it a bit with the
"largefile" option, which modifies the inode_ratio.  This gives us ~2
million usable inodes on a 2TB filesystem.

As well, by default, mkfs sets the block reserve to 5%, which wastes a fair
amount of space, since this space is only accessible to the root user.  We
tune this down to 1% at mkfs time, but you can use tune2fs at runtime to
change it.

I don't know that I would use writeback. This mode is problematic in the
event of a crash because it can allow old data to exist on the FS, but with
new metadata.  I consider this corruption.  Unless you know your
environment to be super stable (meaning no OS or hardware-induced crashes)
AND you have stable, UPS-backed power, I would steer clear of this.

If you're looking for the utmost in filesystem performance, you're better
off looking at the controller card you're using.  Right now, we're using
LSI9207-8i and seeing an aggregate 1.6-1.8GBytes/sec throughput across 12
drives in JBOD.  Our older LSI-based cards can only sustain maybe a quarter
of that in the same disk configuration.


On Mon, Oct 6, 2014 at 4:46 PM, Colin Kincaid Williams <discord@uw.edu>

> Hi,
> I'm trying to figure out what are more ideal settings for using ext4 on
> hadoop cluster datanodes. From the hadoop site its recommended nodelalloc
> option is chosen in the fstab. Is that still a preferred option?
> I read elsewhere to disable the ext4 journal, and use data=writeback.
> http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html
> Finally, in some slides i read to use dir_index,sparse_super,extent when
> creating the filesystem, and mount noatime and nodiratime
> http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud

Travis Campbell

View raw message