hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"
Date Wed, 07 Jan 2009 00:15:17 GMT
Hi tienduc_dinh,

Just a bit of a background, which should help to answer your questions.
TestDFSIO mappers perform one operation (read or write) each, measure
the time taken by the operation and output the following three values:
(I am intentionally omitting some other output stuff.)
- size(i)
- time(i)
- rate(i) = size(i) / time(i)
i is the index of the map task 0 <= i < N, and N is the "-nrFiles" value,
which equals the number of maps.

Then the reduce sums those values and writes them into "part-00000".
That is you get three fields in it
size = size(0) + ... + size(N-1)
time = time(0) + ... + time(N-1)
rate = rate(0) + ... + rate(N-1)

Then we calculate
throughput = size / time
averageIORate = rate / N

So answering your questions
- There should be only one reduce task, otherwise you will have to
manually sum corresponding values in "part-00000" and "part-00001".
- The value of the ":rate" after the reduce equals the sum of individual
rates of each operation. So if you want to have an average you should
divide it by the number tasks rather than multiply.

Now, in your case you create only one file "-nrFiles 1", which means
you run only one map task.
Setting "mapred.map.tasks" to 10 in hadoop-site.xml defines the default
number of tasks per job. See here
http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.map.tasks
In case of TestDFSIO it will be overridden by "-nrFiles".

Hope this answers your questions.
Thanks,
--Konstantin



tienduc_dinh wrote:
> Hello,
> 
> I'm now using hadoop-0.18.0 and testing it on a cluster with 1 master and 4
> slaves. In hadoop-site.xml the value of "mapred.map.tasks" is 10. Because
> the values "throughput" and "average IO rate" are similar, I just post the
> values of "throughput" of the same command with 3 times running
> 
> - > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048
> -nrFiles 1
> 
> + with "dfs.replication = 1" => 33,60 / 31,48 / 30,95
> 
> + with "dfs.replication = 2" => 26,40 / 20,99 / 21,70
> 
> I find something strange while reading the source code. 
> 
> - The value of mapred.reduce.tasks is always set to 1 
> 
> job.setNumReduceTasks(1) in the function runIOTest()  and reduceFile = new
> Path(WRITE_DIR, "part-00000") in analyzeResult().
> 
> So I think, if we properly have mapred.reduce.tasks = 2, we will have on the
> file system 2 Paths to "part-00000" and "part-00001", e.g.
> /benchmarks/TestDFSIO/io_write/part-00000
> 
> - And i don't understand the line with "double med = rate / 1000 / tasks".
> Is it not "double med = rate * tasks / 1000 "

Mime
View raw message