Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0971699C4 for ; Thu, 30 Aug 2012 07:14:39 +0000 (UTC) Received: (qmail 53866 invoked by uid 500); 30 Aug 2012 07:14:34 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 53749 invoked by uid 500); 30 Aug 2012 07:14:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 53737 invoked by uid 99); 30 Aug 2012 07:14:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Aug 2012 07:14:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gdsayshi@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Aug 2012 07:14:28 +0000 Received: by qcsc21 with SMTP id c21so1237596qcs.35 for ; Thu, 30 Aug 2012 00:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=PxD9+a4CLG6aP07unnsNp8jAkhNoUO2hLhTt4pyyRYo=; b=jO64HBC14DD4l2/i8oKYB9TvxppDuXWg++ui3nXsR5bw1w/DbQrW0dgsKKRbf9OJwq 6zeQHLC3ZH1++yMiw50qmG32RbAmttF6GI58VQde7KAnoeB0BKTvnUCE6fksOrzXkuq/ jKXf+L1wRvFdavoWQxV20ZuQYCJpOiYLTwE2WMKRxOgNm7OJgMj96mUTgqnvyk1+K+9v WqRXRyJcQ/z9Go+rFxeg97bQHIY1OrFmaizYcWw4EHHqw9mEz9KkvIE2YK8IDgekfrGL 6olwdDqktj4uyg1DTAJKsCID6jEv1qUJrYnvLOC2d0l+3KiTFsYtLqZj7Ctc7J2e2Nmd +lhA== MIME-Version: 1.0 Received: by 10.229.105.205 with SMTP id u13mr2463171qco.9.1346310848096; Thu, 30 Aug 2012 00:14:08 -0700 (PDT) Received: by 10.49.60.102 with HTTP; Thu, 30 Aug 2012 00:14:08 -0700 (PDT) Date: Thu, 30 Aug 2012 12:44:08 +0530 Message-ID: Subject: TestDFSIO info required From: Gaurav Dasgupta To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0022158c0f0ddbaad404c8766b32 X-Virus-Checked: Checked by ClamAV on apache.org --0022158c0f0ddbaad404c8766b32 Content-Type: text/plain; charset=ISO-8859-1 Hi, I ran TestDFSIO in my Hadoop cluster: *hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write -nrFiles 100 -fileSize 10240* The report generated is: *12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write* *12/08/30 01:31:34 INFO fs.TestDFSIO: Date & time: Thu Aug 30 01:31:34 CDT 2012* *12/08/30 01:31:34 INFO fs.TestDFSIO: Number of files: 100* *12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0* *12/08/30 01:31:34 INFO fs.TestDFSIO: Throughput mb/sec: 5.54130695296031* *12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec: 5.875064849853516* *12/08/30 01:31:34 INFO fs.TestDFSIO: IO rate std deviation: 1.503623716482166* *12/08/30 01:31:34 INFO fs.TestDFSIO: Test exec time sec: 3490.168* ** I was refering to the blog: http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ As per my understanding from that blog, I calculated *Throughput = (1024000*1000)/3490.168 = 293395.61* which is not my throughput ofcourse. Then I found a file in the HDFS output directory of the job: *hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this: *f:rate 587506.5 f:sqrate 3677727.2 l:size 1073741824000 l:tasks 100 l:time 184793950* Then I applied this above time in the formula: *Throughput = (1024000*1000)/184793950 = 5.541* which is my throughput. Can someone tell me what exactly is this time in the HDFS output directory file "part-0000" ? Thanks, Gaurav Dasgupta --0022158c0f0ddbaad404c8766b32 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
=A0
I ran TestDFSIO in my Hadoop cluster:
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -wri= te -nrFiles 100 -fileSize 10240
The report generated is:
12/08/30 01:31:34 INFO fs.= TestDFSIO: ----- TestDFSIO ----- : write

12/08/30 01:31:34 INFO fs.TestDFSIO:=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 Date & time: Thu Aug 30 01:31:34 CDT= 2012

12/08/30 01:31:34 INFO fs.TestDFSIO:=A0=A0= =A0=A0=A0=A0=A0 Number of files: 100

12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes proc= essed: 1024000.0

12/08/30 01:31:34 INFO fs.TestDFSIO:=A0=A0= =A0=A0=A0 Throughput mb/sec: 5.54130695296031

12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate m= b/sec: 5.875064849853516

12/08/30 01:31:34 INFO fs.TestDFSIO:=A0 IO rate std deviation: 1.503623716482166

12/08/30 01:31:34 INFO fs.TestDFSIO:=A0=A0=A0= =A0 Test exec time sec: 3490.168

=A0

I was refering to the blog:

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stre= ss-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

=A0

As per my understanding from that blog, I=A0calculated Thr= oughput =3D (1024000*1000)/3490.168 =3D =A0293395.61 which is not my t= hroughput ofcourse.

Then I found a file in = the HDFS output directory of the job:

hadoop fs -cat = /benchmarks/TestDFSIO/io_write/part-00000 gave me this:

=A0

f:rate=A0587506.5f:sqrate=A03677727.2
l:size=A01073741824000
l:tasks=A0100
l:time= =A0184793950

Then I applied this abo= ve time in the formula: Throughput =3D (1024000*1000)/184793950 =3D 5.5= 41 which is my throughput.

=A0

Can someone tell me wha= t exactly is this time in the HDFS output directory=A0file "part-0000&= quot; ?

=A0

Thanks,

Gaurav Dasgupta

--0022158c0f0ddbaad404c8766b32--