hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Hadoop performance benchmarking with TestDFSIO
Date Thu, 29 Sep 2011 09:58:09 GMT
On 28/09/11 22:45, Sameer Farooqui wrote:
> Hi everyone,
>
> I'm looking for some recommendations for how to get our Hadoop cluster to do
> faster I/O.
>
> Currently, our lab cluster is 8 worker nodes and 1 master node (with
> NameNode and JobTracker).
>
> Each worker node has:
> - 48 GB RAM
> - 16 processors (Intel Xeon E5630 @ 2.53 GHz)
> - 1 Gb Ethernet connection
>
>
> Due to company policy, we have to keep the HDFS storage on a disk array. Our
> SAS connected array is capable of 6 Gb (768 MB) for each of the 8 hosts. So,
> theoretically, we should be able to get a max of 6 GB simultaneous reads
> across the 8 nodes if we benchmark it.

missing the point on Hadoop there; you will end up getting the bandwidth 
of the HDD most likely to fail next, copy replication is overkill and 
you will reach limits on scale both technical (SAN scalability) and 
financial.

>
> Our disk array is presenting each of the 8 nodes with a 21 TB LUN. The LUN
> is RAID-5 across 12 disks on the array. That LUN is partitioned on the
> server into 6 different devices like this:
>



> The file system type is ext3.

set noatime

>
> So, when we run TestDFSIO, here are the results:
>
> *++ Write ++*
>>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -write
> -nrFiles 80 -fileSize 10000
>
> 11/09/27 18:54:53 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
> 11/09/27 18:54:53 INFO fs.TestDFSIO:            Date&  time: Tue Sep 27
> 18:54:53 EDT 2011
> 11/09/27 18:54:53 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/27 18:54:53 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/27 18:54:53 INFO fs.TestDFSIO:      Throughput mb/sec: 8.2742240008678
> 11/09/27 18:54:53 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 8.288116455078125
> 11/09/27 18:54:53 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.3435565217052116
> 11/09/27 18:54:53 INFO fs.TestDFSIO:     Test exec time sec: 1427.856
>
> So, throughput across all 8 nodes is 8.27 * 80 = 661 MB per second.
>
>
> *++ Read ++*
>>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -read
> -nrFiles 80 -fileSize 10000
>
> 11/09/27 19:43:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
> 11/09/27 19:43:12 INFO fs.TestDFSIO:            Date&  time: Tue Sep 27
> 19:43:12 EDT 2011
> 11/09/27 19:43:12 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/27 19:43:12 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/27 19:43:12 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.854318503905489
> 11/09/27 19:43:12 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.96372652053833
> 11/09/27 19:43:12 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.9885505979030621
> 11/09/27 19:43:12 INFO fs.TestDFSIO:     Test exec time sec: 2055.465
>
>
> So, throughput across all 8 nodes is 5.85 * 80 = 468 MB per second.
>
>
> *Question 1:* Why are the reads and writes so much slower than expected? Any
> suggestions about what can be changed? I understand that RAID-5 backed disks
> are an unorthodox configuration for HDFS, but has anybody successfully done
> this? If so, what kind of results did you see?


>
>
> Also, we detached the 8 nodes from the disk array and connected each of them
> to 6 local hard drives for testing (w/ ext4 file system). Then we ran the
> same read TestDFSIO and saw this:
>
> 11/09/26 20:24:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
> 11/09/26 20:24:09 INFO fs.TestDFSIO:            Date&  time: Mon Sep 26
> 20:24:09 EDT 2011
> 11/09/26 20:24:09 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/26 20:24:09 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/26 20:24:09 INFO fs.TestDFSIO:      Throughput mb/sec:
> 13.065623285187982
> 11/09/26 20:24:09 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 15.160531997680664
> 11/09/26 20:24:09 INFO fs.TestDFSIO:  IO rate std deviation:
> 8.000530562022949
> 11/09/26 20:24:09 INFO fs.TestDFSIO:     Test exec time sec: 1123.447
>
>
> So, with local disks, reads are about 1 GB per second across the 8 nodes.
> Much faster!

Much lower cost per TB too. Orders of magnitude lower.

>
> With 6 local disks, writes performed the same though:
>
> 11/09/26 19:49:58 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
> 11/09/26 19:49:58 INFO fs.TestDFSIO:            Date&  time: Mon Sep 26
> 19:49:58 EDT 2011
> 11/09/26 19:49:58 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/26 19:49:58 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/26 19:49:58 INFO fs.TestDFSIO:      Throughput mb/sec:
> 8.573949802610528
> 11/09/26 19:49:58 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 8.588902473449707
> 11/09/26 19:49:58 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.3639466752546032
> 11/09/26 19:49:58 INFO fs.TestDFSIO:     Test exec time sec: 1383.734
>
>
> Write throughput across the cluster was 685 MB per second.

Writes get streamed to multiple HDFS nodes for redundancy; you've got 
the bandwidth + network overhead and 3x the data.


Options
  -stop using HDFS on the SAN, it's the wrong approach. Mount the SAN 
directly and use file:// URLs, let the SAN do the networking and 
redundancy.
  -buy some local HDDs at least for all the temp data: logs, overspill 
mapreduce.tmp.dir. You don't need redundancy here


Mime
View raw message