hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Kincaid Williams <disc...@uw.edu>
Subject Re: ext4 on a hadoop cluster datanodes
Date Wed, 08 Oct 2014 00:11:15 GMT
Hi Travis,

Are you using SSDs or spinning disks in your configuration?

Thanks,

Colin Williams

On Mon, Oct 6, 2014 at 3:09 PM, Travis <hcoyote@ghostar.org> wrote:

> For filesystem creation, we use the following with mkfs.ext4
>
> mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L
> $HDFS_LABEL /dev/${DEV}1
>
> By default, mkfs creates way too many inodes, so we tune it a bit with the
> "largefile" option, which modifies the inode_ratio.  This gives us ~2
> million usable inodes on a 2TB filesystem.
>
> As well, by default, mkfs sets the block reserve to 5%, which wastes a
> fair amount of space, since this space is only accessible to the root
> user.  We tune this down to 1% at mkfs time, but you can use tune2fs at
> runtime to change it.
>
> I don't know that I would use writeback. This mode is problematic in the
> event of a crash because it can allow old data to exist on the FS, but with
> new metadata.  I consider this corruption.  Unless you know your
> environment to be super stable (meaning no OS or hardware-induced crashes)
> AND you have stable, UPS-backed power, I would steer clear of this.
>
> If you're looking for the utmost in filesystem performance, you're better
> off looking at the controller card you're using.  Right now, we're using
> LSI9207-8i and seeing an aggregate 1.6-1.8GBytes/sec throughput across 12
> drives in JBOD.  Our older LSI-based cards can only sustain maybe a quarter
> of that in the same disk configuration.
>
> Travis
>
> On Mon, Oct 6, 2014 at 4:46 PM, Colin Kincaid Williams <discord@uw.edu>
> wrote:
>
>> Hi,
>>
>> I'm trying to figure out what are more ideal settings for using ext4 on
>> hadoop cluster datanodes. From the hadoop site its recommended nodelalloc
>> option is chosen in the fstab. Is that still a preferred option?
>>
>> I read elsewhere to disable the ext4 journal, and use data=writeback.
>>
>> http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html
>>
>> Finally, in some slides i read to use dir_index,sparse_super,extent when
>> creating the filesystem, and mount noatime and nodiratime
>>
>>
>> http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud
>>
>>
>>
>>
>>
>>
>
>
> --
> Travis Campbell
> travis@ghostar.org
>

Mime
View raw message