hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Dasgupta <gdsay...@gmail.com>
Subject TestDFSIO info required
Date Thu, 30 Aug 2012 07:14:08 GMT
Hi,

I ran TestDFSIO in my Hadoop cluster:
*hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write -nrFiles
100 -fileSize 10240*
The report generated is:
*12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*

*12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
01:31:34 CDT 2012*

*12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*

*12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*

*12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
5.54130695296031*

*12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
5.875064849853516*

*12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
1.503623716482166*

*12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*

**

I was refering to the blog:

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/



As per my understanding from that blog, I calculated *Throughput =
(1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.

Then I found a file in the HDFS output directory of the job:

*hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:



*f:rate 587506.5
f:sqrate 3677727.2
l:size 1073741824000
l:tasks 100
l:time 184793950*

Then I applied this above time in the formula: *Throughput =
(1024000*1000)/184793950 = 5.541* which is my throughput.



Can someone tell me what exactly is this time in the HDFS output
directory file "part-0000" ?



Thanks,

Gaurav Dasgupta

Mime
View raw message