Hi Konstantin, thanks so much for your help. I was a litte bit confused about why my setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map anything. So your answer with > In case of TestDFSIO it will be overridden by "-nrFiles". is the key. I need now your confirm to know, if I've understood it right. + If I want to write 2 GB with 1 map task, I should use the following command. > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048 -nrFiles > 1 The values of throughput are, e.g. 33,60 / 31,48 / 30,95. + If I want to write 2 GB with 4 map tasks, I should use the following command. > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 5012 -nrFiles > 4 The values of throughput are, e.g. 31,50 / 32,09 / 30,56. Can you please explain me, why the values in case 2 are much better. I have 1 master and 4 slaves and if I calculate it right, they must be even 4 times higher, right ? Sorry for my poor english skill and thanks very much for your help. Tien Duc Dinh Konstantin Shvachko wrote: > > Hi tienduc_dinh, > > Just a bit of a background, which should help to answer your questions. > TestDFSIO mappers perform one operation (read or write) each, measure > the time taken by the operation and output the following three values: > (I am intentionally omitting some other output stuff.) > - size(i) > - time(i) > - rate(i) = size(i) / time(i) > i is the index of the map task 0 <= i < N, and N is the "-nrFiles" value, > which equals the number of maps. > > Then the reduce sums those values and writes them into "part-00000". > That is you get three fields in it > size = size(0) + ... + size(N-1) > time = time(0) + ... + time(N-1) > rate = rate(0) + ... + rate(N-1) > > Then we calculate > throughput = size / time > averageIORate = rate / N > > So answering your questions > - There should be only one reduce task, otherwise you will have to > manually sum corresponding values in "part-00000" and "part-00001". > - The value of the ":rate" after the reduce equals the sum of individual > rates of each operation. So if you want to have an average you should > divide it by the number tasks rather than multiply. > > Now, in your case you create only one file "-nrFiles 1", which means > you run only one map task. > Setting "mapred.map.tasks" to 10 in hadoop-site.xml defines the default > number of tasks per job. See here > http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.map.tasks > In case of TestDFSIO it will be overridden by "-nrFiles". > > Hope this answers your questions. > Thanks, > --Konstantin > > > > tienduc_dinh wrote: >> Hello, >> >> I'm now using hadoop-0.18.0 and testing it on a cluster with 1 master and >> 4 >> slaves. In hadoop-site.xml the value of "mapred.map.tasks" is 10. Because >> the values "throughput" and "average IO rate" are similar, I just post >> the >> values of "throughput" of the same command with 3 times running >> >> - > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048 >> -nrFiles 1 >> >> + with "dfs.replication = 1" => 33,60 / 31,48 / 30,95 >> >> + with "dfs.replication = 2" => 26,40 / 20,99 / 21,70 >> >> I find something strange while reading the source code. >> >> - The value of mapred.reduce.tasks is always set to 1 >> >> job.setNumReduceTasks(1) in the function runIOTest() and reduceFile = >> new >> Path(WRITE_DIR, "part-00000") in analyzeResult(). >> >> So I think, if we properly have mapred.reduce.tasks = 2, we will have on >> the >> file system 2 Paths to "part-00000" and "part-00001", e.g. >> /benchmarks/TestDFSIO/io_write/part-00000 >> >> - And i don't understand the line with "double med = rate / 1000 / >> tasks". >> Is it not "double med = rate * tasks / 1000 " > > -- View this message in context: http://www.nabble.com/Re%3A-TestDFSIO-delivers-bad-values-of-%22throughput%22-and-%22average-IO-rate%22-tp21322404p21332803.html Sent from the Hadoop core-user mailing list archive at Nabble.com.