Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89F81DC9D for ; Wed, 29 Aug 2012 08:45:14 +0000 (UTC) Received: (qmail 75468 invoked by uid 500); 29 Aug 2012 08:45:09 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 75234 invoked by uid 500); 29 Aug 2012 08:45:09 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 75220 invoked by uid 99); 29 Aug 2012 08:45:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 08:45:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of praveenesh@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 08:45:02 +0000 Received: by wibhn17 with SMTP id hn17so4730179wib.11 for ; Wed, 29 Aug 2012 01:44:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Kh+EqgZ1A5LNlCVzp197GHIVTNHmv0o2F9fhKXWdVrU=; b=F4eHMwg/wdWuv0V9EYvATGaPjrzpkv2/rA83KTyyZYg8RT5tMKnx3Yk9VBXmT3Z2Sh 8N9iZYH7FeBaeDhbKkwdwL33niRGMDT+7eOCRZqzXAV/qZ5c0pt+zfXA178vGbn/YXOA Zm+xhvhDtmDdtUCGN1y3u35/p+PAYi8T2tGc7jN+xOMy9p6dCKkRkwczTkLTVsCWepFt 7PRlwfSCMz19GgcrSJqrIrgdswAn06wrwqbE8e4yxcJDHgY2Iv9567JIpbEKbaBEhjyO tOjGuTeAtlfVl1modEO8F6ESPEsKVVE2WLHuSgrwCRd0vHej5mj7T7LbZ0G8c0Oh2w7e 1ofA== MIME-Version: 1.0 Received: by 10.216.95.10 with SMTP id o10mr498947wef.213.1346229880865; Wed, 29 Aug 2012 01:44:40 -0700 (PDT) Received: by 10.223.84.201 with HTTP; Wed, 29 Aug 2012 01:44:40 -0700 (PDT) In-Reply-To: References: Date: Wed, 29 Aug 2012 03:44:40 -0500 Message-ID: Subject: Re: MRBench Maps strange behaviour From: praveenesh kumar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e0cb4e700315d5c36704c86391b8 X-Virus-Checked: Checked by ClamAV on apache.org --e0cb4e700315d5c36704c86391b8 Content-Type: text/plain; charset=ISO-8859-1 Then the question arises how MRBench is using the parameters : According to the mail he send... he is running MRBench with the following parameter: * hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10 * I guess he is assuming the MRbench to launch 10 mappers and 10 reducers. But he is getting some different results which are visible in the counters and we can use all our map and input-split logics to justify the counter outputs. The question arises here -- how can we use MRBench -- what it provides you ? How can we control it to run with different parameters to do some benchmarking ? Can someone explain how to use MRBench and what it exactly does. Regards, Praveenesh On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala wrote: > Assume you are asking about what is the exact number of maps launched. > If yes, then the output of the MRBench run is printing the counter > "Launched map tasks". That is the exact value of maps launched. > > Thanks > Hemanth > > On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta > wrote: > > Hi Hemanth, > > > > Thanks for the reply. > > Can you tell me how can I calculate or ensure from the counters what > should > > be the exact number of Maps? > > Thanks, > > Gaurav Dasgupta > > On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala > > wrote: > >> > >> Hi, > >> > >> The number of maps specified to any map reduce program (including > >> those part of MRBench) is generally only a hint, and the actual number > >> of maps will be influenced in typical cases by the amount of data > >> being processed. You can take a look at this wiki link to understand > >> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces > >> > >> In the examples below, since the data you've generated is different, > >> the number of mappers are different. To be able to judge your > >> benchmark results, you'd need to benchmark against the same data (or > >> at least same type of type - i.e. size and type). > >> > >> The number of maps printed at the end is straight from the input > >> specified and doesn't reflect what the job actually ran with. The > >> information from the counters is the right one. > >> > >> Thanks > >> Hemanth > >> > >> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta > >> wrote: > >> > Hi All, > >> > > >> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node > >> > CDH3 > >> > cluster. After executing, I had some strange observations regarding > the > >> > number of Maps it ran. > >> > > >> > First I ran the command: > >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 > -maps > >> > 200 > >> > -reduces 200 -inputLines 1024 -inputType random > >> > And I could see that the actual number of Maps it ran was 201 (for all > >> > the 3 > >> > runs) instead of 200 (Though the end report displays the launched to > be > >> > 200). Here is the console report: > >> > > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: > >> > job_201208230144_0035 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Job Counters > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Launched reduce tasks=200 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=617209 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all > >> > reduces > >> > waiting after reserving slots (ms)=0 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all > >> > maps > >> > waiting after reserving slots (ms)=0 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Rack-local map tasks=137 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Launched map tasks=201 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: Data-local map tasks=64 > >> > > >> > 12/08/28 04:34:35 INFO mapred.JobClient: > >> > SLOTS_MILLIS_REDUCES=1756882 > >> > > >> > > >> > > >> > Again, I ran the MRBench for just 10 Maps and 10 Reduces: > >> > > >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 > >> > -reduces 10 > >> > > >> > > >> > > >> > This time the actual number of Maps were only 2 and again the end > report > >> > displays Maps Lauched to be 10. The console output: > >> > > >> > > >> > > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: > >> > job_201208230144_0040 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Job Counters > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Launched reduce tasks=20 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6648 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all > >> > reduces > >> > waiting after reserving slots (ms)=0 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all > >> > maps > >> > waiting after reserving slots (ms)=0 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Launched map tasks=2 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Data-local map tasks=2 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: > SLOTS_MILLIS_REDUCES=163257 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: FileSystemCounters > >> > 12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_READ=407 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_READ=258 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: > FILE_BYTES_WRITTEN=1072596 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Map-Reduce Framework > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Map input records=1 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce shuffle bytes=647 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Spilled Records=2 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Map output bytes=5 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: CPU time spent (ms)=17070 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Total committed heap > usage > >> > (bytes)=6218842112 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Map input bytes=2 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Combine input records=0 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=254 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce input records=1 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce input groups=1 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Combine output records=0 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Physical memory (bytes) > >> > snapshot=3348828160 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce output records=1 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Virtual memory (bytes) > >> > snapshot=22955810816 > >> > 12/08/28 05:05:35 INFO mapred.JobClient: Map output records=1 > >> > DataLines Maps Reduces AvgTime (milliseconds) > >> > 1 20 20 17451 > >> > > >> > Can some one please help me understand this behaviour of Hadoop in > this > >> > case. My main purpose of running a MRBench is to calculate the Average > >> > time > >> > for certain amount of Maps, Reduces, InputLines etc. If the number of > >> > Maps > >> > is not what I submitted, then how can I judge my benchmark results? > >> > > >> > > >> > > >> > Thanks, > >> > > >> > Gaurav Dasgupta > > > > > --e0cb4e700315d5c36704c86391b8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Then the question arises how MRBench is using the parameters :
According= to the mail he send... he is running MRBench with the following parameter:=

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10= -reduces 10


I guess he is assuming the MRbench to launch 10 mappers and 10 reducers= . But he is getting some different results which are visible in the counter= s and we can use all our map and input-split logics to justify the counter = outputs.

The question arises here -- how can we use MRBench -- what it provides = you ? How can we control it to run with different parameters to do some ben= chmarking ? Can someone explain how to use MRBench and what it exactly does= .

Regards,
Praveenesh

On Wed, Aug 29= , 2012 at 3:31 AM, Hemanth Yamijala <yhemanth@gmail.com> wr= ote:
Assume you are asking about what is the exac= t number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.
Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gdsayshi@gmail.com> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what s= hould
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yhemanth@gmail.com>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including<= br> >> those part of MRBench) is generally only a hint, and the actual nu= mber
>> of maps will be influenced in typical cases by the amount of data<= br> >> being processed. You can take a look at this wiki link to understa= nd
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is diff= erent,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same da= ta (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input >> specified and doesn't reflect what the job actually ran with. = The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <
gdsayshi@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-= test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations reg= arding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numR= uns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201= (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the laun= ched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 =A0 Launched red= uce tasks=3D200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS= _MAPS=3D617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 =A0 Total time s= pent by all
>> > reduces
>> > waiting after reserving slots (ms)=3D0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 =A0 Total time s= pent by all
>> > maps
>> > waiting after reserving slots (ms)=3D0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 =A0 Rack-local m= ap tasks=3D137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 =A0 Launched map= tasks=3D201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: =A0 =A0 Data-local m= ap tasks=3D64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=3D1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps= 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the= end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Launched red= uce tasks=3D20
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS= _MAPS=3D6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Total time s= pent by all
>> > reduces
>> > waiting after reserving slots (ms)=3D0
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Total time s= pent by all
>> > maps
>> > waiting after reserving slots (ms)=3D0
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Launched map= tasks=3D2
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Data-local m= ap tasks=3D2
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS= _REDUCES=3D163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 FileSystemCounte= rs
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 FILE_BYTES_R= EAD=3D407
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 HDFS_BYTES_R= EAD=3D258
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 FILE_BYTES_W= RITTEN=3D1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 HDFS_BYTES_W= RITTEN=3D3
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 Map-Reduce Frame= work
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Map input re= cords=3D1
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Reduce shuff= le bytes=3D647
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Spilled Reco= rds=3D2
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Map output b= ytes=3D5
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 CPU time spe= nt (ms)=3D17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Total commit= ted heap usage
>> > (bytes)=3D6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Map input by= tes=3D2
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Combine inpu= t records=3D0
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 SPLIT_RAW_BY= TES=3D254
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Reduce input= records=3D1
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Reduce input= groups=3D1
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Combine outp= ut records=3D0
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Physical mem= ory (bytes)
>> > snapshot=3D3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Reduce outpu= t records=3D1
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Virtual memo= ry (bytes)
>> > snapshot=3D22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient: =A0 =A0 Map output r= ecords=3D1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A020 =A0 =A0 20 =A0 =A0 =A0 = =A0 =A0 17451
>> >
>> > Can some one please help me understand this behaviour of Hado= op in this
>> > case. My main purpose of running a MRBench is to calculate th= e Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the n= umber of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark re= sults?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>

--e0cb4e700315d5c36704c86391b8--