From core-user-return-11072-apmail-hadoop-core-user-archive=hadoop.apache.org@hadoop.apache.org Wed Jan 07 14:51:55 2009 Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 36394 invoked from network); 7 Jan 2009 14:51:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Jan 2009 14:51:55 -0000 Received: (qmail 13343 invoked by uid 500); 7 Jan 2009 14:51:49 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 13188 invoked by uid 500); 7 Jan 2009 14:51:48 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 13177 invoked by uid 99); 7 Jan 2009 14:51:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2009 06:51:48 -0800 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=DNS_FROM_OPENWHOIS,FORGED_YAHOO_RCVD,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2009 14:51:39 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1LKZkc-0005oT-G1 for core-user@hadoop.apache.org; Wed, 07 Jan 2009 06:51:18 -0800 Message-ID: <21332803.post@talk.nabble.com> Date: Wed, 7 Jan 2009 06:51:18 -0800 (PST) From: tienduc_dinh To: core-user@hadoop.apache.org Subject: Re: TestDFSIO delivers bad values of "throughput" and "average IO rate" In-Reply-To: <4963F415.9030508@yahoo-inc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: tienduc_dinh@yahoo.com References: <4963F415.9030508@yahoo-inc.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi Konstantin, thanks so much for your help. I was a litte bit confused about why my setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map anything. So your answer with > In case of TestDFSIO it will be overridden by "-nrFiles". is the key. I need now your confirm to know, if I've understood it right. + If I want to write 2 GB with 1 map task, I should use the following command. > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048 -nrFiles > 1 The values of throughput are, e.g. 33,60 / 31,48 / 30,95. + If I want to write 2 GB with 4 map tasks, I should use the following command. > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 5012 -nrFiles > 4 The values of throughput are, e.g. 31,50 / 32,09 / 30,56. Can you please explain me, why the values in case 2 are much better. I have 1 master and 4 slaves and if I calculate it right, they must be even 4 times higher, right ? Sorry for my poor english skill and thanks very much for your help. Tien Duc Dinh Konstantin Shvachko wrote: > > Hi tienduc_dinh, > > Just a bit of a background, which should help to answer your questions. > TestDFSIO mappers perform one operation (read or write) each, measure > the time taken by the operation and output the following three values: > (I am intentionally omitting some other output stuff.) > - size(i) > - time(i) > - rate(i) = size(i) / time(i) > i is the index of the map task 0 <= i < N, and N is the "-nrFiles" value, > which equals the number of maps. > > Then the reduce sums those values and writes them into "part-00000". > That is you get three fields in it > size = size(0) + ... + size(N-1) > time = time(0) + ... + time(N-1) > rate = rate(0) + ... + rate(N-1) > > Then we calculate > throughput = size / time > averageIORate = rate / N > > So answering your questions > - There should be only one reduce task, otherwise you will have to > manually sum corresponding values in "part-00000" and "part-00001". > - The value of the ":rate" after the reduce equals the sum of individual > rates of each operation. So if you want to have an average you should > divide it by the number tasks rather than multiply. > > Now, in your case you create only one file "-nrFiles 1", which means > you run only one map task. > Setting "mapred.map.tasks" to 10 in hadoop-site.xml defines the default > number of tasks per job. See here > http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.map.tasks > In case of TestDFSIO it will be overridden by "-nrFiles". > > Hope this answers your questions. > Thanks, > --Konstantin > > > > tienduc_dinh wrote: >> Hello, >> >> I'm now using hadoop-0.18.0 and testing it on a cluster with 1 master and >> 4 >> slaves. In hadoop-site.xml the value of "mapred.map.tasks" is 10. Because >> the values "throughput" and "average IO rate" are similar, I just post >> the >> values of "throughput" of the same command with 3 times running >> >> - > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048 >> -nrFiles 1 >> >> + with "dfs.replication = 1" => 33,60 / 31,48 / 30,95 >> >> + with "dfs.replication = 2" => 26,40 / 20,99 / 21,70 >> >> I find something strange while reading the source code. >> >> - The value of mapred.reduce.tasks is always set to 1 >> >> job.setNumReduceTasks(1) in the function runIOTest() and reduceFile = >> new >> Path(WRITE_DIR, "part-00000") in analyzeResult(). >> >> So I think, if we properly have mapred.reduce.tasks = 2, we will have on >> the >> file system 2 Paths to "part-00000" and "part-00001", e.g. >> /benchmarks/TestDFSIO/io_write/part-00000 >> >> - And i don't understand the line with "double med = rate / 1000 / >> tasks". >> Is it not "double med = rate * tasks / 1000 " > > -- View this message in context: http://www.nabble.com/Re%3A-TestDFSIO-delivers-bad-values-of-%22throughput%22-and-%22average-IO-rate%22-tp21322404p21332803.html Sent from the Hadoop core-user mailing list archive at Nabble.com.