Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1569BD597 for ; Tue, 2 Oct 2012 18:17:45 +0000 (UTC) Received: (qmail 98676 invoked by uid 500); 2 Oct 2012 18:17:40 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 98601 invoked by uid 500); 2 Oct 2012 18:17:40 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 98594 invoked by uid 99); 2 Oct 2012 18:17:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 18:17:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.91.145] (HELO nm15-vm3.bullet.mail.ne1.yahoo.com) (98.138.91.145) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 02 Oct 2012 18:17:31 +0000 Received: from [98.138.90.49] by nm15.bullet.mail.ne1.yahoo.com with NNFMP; 02 Oct 2012 18:17:09 -0000 Received: from [98.138.89.167] by tm2.bullet.mail.ne1.yahoo.com with NNFMP; 02 Oct 2012 18:17:09 -0000 Received: from [127.0.0.1] by omp1023.mail.ne1.yahoo.com with NNFMP; 02 Oct 2012 18:17:09 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 112562.50629.bm@omp1023.mail.ne1.yahoo.com Received: (qmail 9971 invoked by uid 60001); 2 Oct 2012 18:17:09 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1349201828; bh=rjyTpw/WPTa7F/D6OiStE2i8/jWOkO6U4a/CwR9Ithw=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=j4PlMR47YPh+OrIIF1dvD6dDlByrLv2mxrQThZS5VrZzVHi0jIts2sDgrYoODU19R71rJacj0pbEjZxSsQpwGPRILcMi+CSEmIS9cfrzV321XfX9WmfAVfOO6nONXHtzh0DkdBQBZN4hSXonYj7xI+5Vfc8jKI02zIhScKvsUzc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=S8ZEdxgVpB6cBU4Of32g8uvmBCpoV5Lm4pZiu2bLkn0Ecw9rW6c4kaaw7YkRYQmCNax1LlB+Z82LxeeylwC3yLvfWtt6kUohpjpxER+W2Kdelgni8R9g7UBmcFyEguYlEeno11JPTKOqYIjRYGU8fDiZ2yiL0+qQBJhYwUcd//Y=; X-YMail-OSG: dGhQntYVM1kbduyYbJkGy.T91NvrNa3FbGLO_xkLud6bi0l .4ysJTHXRXbpRUgPRHUDyeTG_CxG2yiJdJhw2OqW8ZKDxBo0PPmCyeD.8gyy zlJPkj30_5_yW6VqsV46jwSVDh_b38HsJl.xi5bkPuWer5Ij8BbA75.zJLlY v3Akal4CPetTV3D4TgRDU7mxrhEm7mNg9_7CcgjOwkYZyLdfjH7p_gDua0mN kAo8T7TCcudUKMGmjgwr3aQHG89mP0daeZ4SpcBtKS92oHDLK4TaWiWE.RMG N4TCTjbo6gsl8gJdIz7MfTqtslA2Jwguf4rPRl5M6o2kDuQ_h5fIHM4q6wqO YPWYM_Go8ea_OUZ3WtVmtbpokWU6CgtHslF6ce.0hAiwT9glqkWOyj5m1x7Y ZgFkDUpAPazEClK6T6uD8AIUODqNAd28q3Y1hMLufLA4NbEXQWUVZKKt05t1 3iXizoKJ33vWbHSzTZpiNKoUaoOEt8dVbhdRTSwgFe1fzGCfsDwLpUzcpJTS 2mTdw3.39ZbDUQYhkQVgIGCYCeQdsfU234EyPlNV3264sqv8S1c0q_XNUEze p4pwh6cY7.VyG_sp6v..LWi4Dee2sivc5ru0stStdoe4- Received: from [62.49.31.174] by web125306.mail.ne1.yahoo.com via HTTP; Tue, 02 Oct 2012 11:17:08 PDT X-Mailer: YahooMailWebService/0.8.121.434 References: <1349195676.38555.YahooMailNeo@web125301.mail.ne1.yahoo.com> <1349199225.73250.YahooMailNeo@web125301.mail.ne1.yahoo.com> <631947539-1349199470-cardhu_decombobulator_blackberry.rim.net-1640415811-@b3.c16.bise7.blackberry> Message-ID: <1349201828.30544.YahooMailNeo@web125306.mail.ne1.yahoo.com> Date: Tue, 2 Oct 2012 11:17:08 -0700 (PDT) From: Shing Hing Man Reply-To: Shing Hing Man Subject: Re: How to lower the total number of map tasks To: "user@hadoop.apache.org" , "bejoy.hadoop@gmail.com" In-Reply-To: <631947539-1349199470-cardhu_decombobulator_blackberry.rim.net-1640415811-@b3.c16.bise7.blackberry> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-2016055526-203052437-1349201828=:30544" ---2016055526-203052437-1349201828=:30544 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable I have done the following.=0A=0A1)=A0 stop-all.sh=0A2)=A0 In mapred-site.xm= l,=A0 added=0A=0A=A0 mapred.max.split.size=0A=A0 134217728=0A=0A=0A=A0 =0A=0A(df.block.size remain un= changed at=A0 67108864)=0A=0A3) start-all.sh =0A=0A=0A4) Use hadoop fs -cp = src destn,=A0 to copy=A0 my original file to=A0 another hdfs directory.=0A= =0A5) Run my mapReduce program using the=A0 new=A0 copy of input file . =0A= =0A=A0=0AHowever, in the job.xml, I still get mapred.map.tasks =3D242, whic= h is same as before.=0A=0A=0AI have also tried deleting my input file=A0 in= hdfs and import it again from my local drive. =0A=0AAny more ideas ?=0A=0A= Shing =0A=0A=0A=0A=0A________________________________=0A From: Bejoy KS =0ATo: user@hadoop.apache.org; Shing Hing Man =0ASent: Tuesday, October 2, 2012 6:37 PM=0ASubject: Re: How to = lower the total number of map tasks=0A =0A=0AShing=0A=0AThis doesn't change= the block size of existing files in hdfs, only new files written to hdfs w= ill be affected. To get this in effect for old files you need to re copy th= em atleast within hdfs.=0Ahadoop fs -cp src destn.=0A=0A=0ARegards=0ABejoy = KS=0A=0ASent from handheld, please excuse typos.=0A________________________= ________=0A=0AFrom: Shing Hing Man =0ADate: Tue, 2 Oct = 2012 10:33:45 -0700 (PDT)=0ATo: user@hadoop.apache.org=0AReplyTo: user@hadoop.apache.org =0ASubject: Re: How to lower the tot= al number of map tasks=0A=0A=0A=0A=A0I set the block size using =0A=A0 Conf= iguration.setInt("dfs.block.size",134217728);=0A=0A=0AI have also set it=A0= in mapred-site.xml.=0A=0AShing =0A=0A=0A=0A_______________________________= _=0A From: Chris Nauroth =0ATo: user@hadoop.apach= e.org; Shing Hing Man =0ASent: Tuesday, October 2, 2012 = 6:00 PM=0ASubject: Re: How to lower the total number of map tasks=0A =0A=0A= Those numbers make sense, considering 1 map task per block. =A016 GB file /= 64 MB block size =3D ~242 map tasks.=0A=0AWhen you doubled dfs.block.size,= how did you accomplish that? =A0Typically, the block size is selected at f= ile write time, with a default value from system configuration used if not = specified. =A0Did you "hadoop fs -put" the file with the new block size, or= was it something else?=0A=0AThank you,=0A--Chris=0A=0A=0AOn Tue, Oct 2, 20= 12 at 9:34 AM, Shing Hing Man wrote:=0A=0A=0A>=0A>=0A>I = am running Hadoop 1.0.3 in Pseudo=A0=A0distributed mode.=0A>When I=A0=A0sub= mit a map/reduce job to process a file of=A0=A0size about 16 GB, in job.xml= , I have the following=0A>=0A>=0A>mapred.map.tasks =3D242=0A>mapred.min.spl= it.size =3D0=0A>dfs.block.size =3D 67108864=0A>=0A>=0A>I would like to redu= ce=A0=A0 mapred.map.tasks to see if it improves performance.=0A>I have trie= d doubling=A0=A0the size of=A0=A0dfs.block.size. But the=A0=A0=A0=A0mapred.= map.tasks remains unchanged.=0A>Is there a way to reduce=A0=A0mapred.map.ta= sks=A0=A0?=0A>=0A>=0A>Thanks in advance for any assistance ! =A0=0A>Shing= =0A>=0A> ---2016055526-203052437-1349201828=:30544 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
I have don= e the following.

1) = ; stop-all.sh
2)  In mapred-site.xml,  add= ed
<property>
  <name>mapred.max.split.size</name= >
  <value>134217728</value>
</property>
 
(df.block.size rem= ain unchanged at  67108864)

3) start-a= ll.sh

4) Use hadoop fs -cp src destn,  to= copy  my original file to  another hdfs directory.
5) Run my mapReduce program using the  new  copy of i= nput file .
 
However, in the job.xml, I still get m= apred.map.tasks =3D242, which is same as before.


I have also tri= ed deleting my input file  in hdfs and import it again from my local d= rive.

Any more ideas ?

Shing


=

From: B= ejoy KS <bejoy.hadoop@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <matmsh@yaho= o.com>
Sent: Tuesd= ay, October 2, 2012 6:37 PM
Subje= ct: Re: How to lower the total number of map tasks
<= /div>
Shing

This doesn't change t= he block size of existing files in hdfs, only new files written to hdfs wil= l be affected. To get this in effect for old files you need to re copy them= atleast within hdfs.
hadoop fs -cp src destn.

Regards
Be= joy KS

Sent from handheld, please excuse typos.

Fro= m: Shing Hing Man <matmsh@yahoo.com>=0A
Date: T= ue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache= .org<user@hadoop.apache.org>
ReplyTo: user@hadoop.a= pache.org=0A
Subject: Re: How to lower the total number of= map tasks


 I set the block size using
  C= onfiguration.setInt("dfs.block.size",134217728);

I h= ave also set it  in mapred-site.xml.

Shing


From: Chris Naurot= h <cnauroth@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <matmsh@yahoo.com= >
Sent: Tuesday, Oc= tober 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
<= br>
Those numbers make sense, considering 1 map ta= sk per block.  16 GB file / 64 MB block size =3D ~242 map tasks.
<= br>
When you doubled dfs.block.size, how did you accomplish that?=  Typically, the block size is selected at file write=0A time, with a = default value from system configuration used if not specified.  Did yo= u "hadoop fs -put" the file with the new block size, or was it something el= se?
=0A

Thank you,
--Chris
=0A
On Tue, Oct 2, 2012 at 9:34 AM, Shing H= ing Man <matmsh@yahoo.= com> wrote:
= =0A=0A
=0A
=0A
=0AI am running Hadoop 1.0.3 in Pseudo  d= istributed mode.
=0AWhen I  submit a map/reduce job to process= a file of  size about 16 GB, in job.xml, I have the following=0A
=0A
=0Amapred.map.tasks =3D242
=0Amapred.min.split.size =3D0<= br>=0Adfs.block.size =3D 67108864
=0A
=0A
=0AI would like to reduc= e   mapred.map.tasks to see if it improves performance.
=0AI h= ave tried doubling  the size of  dfs.block.size. But th= e    mapred.map.tasks remains unchanged.
=0AIs there= a way to reduce  mapred.map.tasks  ?
=0A
=0A
= =0AThanks in advance for any assistance !  
=0AShing
=0A
=0A

=0A=


=0A

= ---2016055526-203052437-1349201828=:30544--