hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian C. Huffman" <bhuff...@etinternational.com>
Subject Re: ext4 on a hadoop cluster datanodes
Date Wed, 12 Nov 2014 18:47:40 GMT
Would this set of ext4 parameters be ok for a 500GB HDFS data drive?

Thanks,
Brian

On 10/06/2014 06:09 PM, Travis wrote:
> For filesystem creation, we use the following with mkfs.ext4
>
> mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L 
> $HDFS_LABEL /dev/${DEV}1
>
> By default, mkfs creates way too many inodes, so we tune it a bit with 
> the "largefile" option, which modifies the inode_ratio.  This gives us 
> ~2 million usable inodes on a 2TB filesystem.
>
> As well, by default, mkfs sets the block reserve to 5%, which wastes a 
> fair amount of space, since this space is only accessible to the root 
> user.  We tune this down to 1% at mkfs time, but you can use tune2fs 
> at runtime to change it.
>
> I don't know that I would use writeback. This mode is problematic in 
> the event of a crash because it can allow old data to exist on the FS, 
> but with new metadata.  I consider this corruption.  Unless you know 
> your environment to be super stable (meaning no OS or hardware-induced 
> crashes) AND you have stable, UPS-backed power, I would steer clear of 
> this.
>
> If you're looking for the utmost in filesystem performance, you're 
> better off looking at the controller card you're using.  Right now, 
> we're using LSI9207-8i and seeing an aggregate 1.6-1.8GBytes/sec 
> throughput across 12 drives in JBOD.  Our older LSI-based cards can 
> only sustain maybe a quarter of that in the same disk configuration.
>
> Travis
>
> On Mon, Oct 6, 2014 at 4:46 PM, Colin Kincaid Williams <discord@uw.edu 
> <mailto:discord@uw.edu>> wrote:
>
>     Hi,
>
>     I'm trying to figure out what are more ideal settings for using
>     ext4 on hadoop cluster datanodes. From the hadoop site its
>     recommended nodelalloc option is chosen in the fstab. Is that
>     still a preferred option?
>
>     I read elsewhere to disable the ext4 journal, and use data=writeback.
>
>     http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html
>
>     Finally, in some slides i read to use
>     dir_index,sparse_super,extent when creating the filesystem, and
>     mount noatime and nodiratime
>
>     http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud
>
>
>
>
>
>
>
>
> -- 
> Travis Campbell
> travis@ghostar.org <mailto:travis@ghostar.org>


Mime
View raw message