hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject our experiences with various filesystems and tuning options
Date Thu, 05 May 2011 20:45:56 GMT

We've performed tests for ext3 and xfs filesystems using different 
settings. The results might be useful for anyone else.

The datanode cluster consists of 15 slave nodes, each equipped with 
1Gbit ethernet, X3220@2.40GHz quadcores and 4x1TB disks. The disk read 
speeds vary from about 90 to 130MB/s. (Tested using hdparm -t).

Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node)
OS: Linux version 2.6.18-238.5.1.el5 (mockbuild@builder10.centos.org) 
(gcc version 4.1.2 20080704 (Red Hat 4.1.2-50))

#our command
for i in `seq 1 10`; do ./hadoop jar 
../hadoop-examples-0.20.2-cdh3u0.jar randomwriter -Ddfs.replication=1 
/rand$i && ./hadoop fs -rmr /rand$i/_logs /rand$i/_SUCCESS && ./hadoop 
distcp -Ddfs.replication=1 /rand$i /rand-copy$i; done

Our benchmark consists of a standard random-writer job followed by a 
distcp of the same data, both using a replication of 1. This is to make 
sure only the disks get hit. Each benchmark is ran several times for 
every configuration. Because of the occasional hickup, I will list both 
the average and the fastest times for each configuration. I read the 
execution times off the jobtracker.

The configurations (with exection times in seconds of Avg-writer / 
Min-writer / Avg-distcp / Min-distcp)
ext3-default      158 / 136 / 411 / 343
ext3-tuned        159 / 132 / 330 / 297
ra1024 ext3-tuned 159 / 132 / 292 / 264
ra1024 xfs-tuned  128 / 122 / 220 / 202

To explain, ext3-tuned is with tuned mount options 
[noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead 
buffer of 1024 blocks. The xfs disks are created using mkfs options 
[size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8].

In conclusion it seems that using tuned xfs filesystems combined with 
increased read-ahead buffers increased our basic hdfs performance with 
about 10% (random-writer) to 40% (distcp).

Hopefully this is useful to anyone. Although I won't be performing more 
tests soon I'd be happy to provide more details.

View raw message