hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shing Hing Man <mat...@yahoo.com>
Subject Re: How to lower the total number of map tasks
Date Tue, 02 Oct 2012 18:17:08 GMT
I have done the following.

1)  stop-all.sh
2)  In mapred-site.xml,  added
<property>
  <name>mapred.max.split.size</name>
  <value>134217728</value>
</property>

  

(df.block.size remain unchanged at  67108864)

3) start-all.sh 


4) Use hadoop fs -cp src destn,  to copy  my original file to  another hdfs directory.

5) Run my mapReduce program using the  new  copy of input file . 

 
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.


I have also tried deleting my input file  in hdfs and import it again from my local drive.


Any more ideas ?

Shing 




________________________________
 From: Bejoy KS <bejoy.hadoop@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <matmsh@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
 

Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs
will be affected. To get this in effect for old files you need to re copy them atleast within
hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <matmsh@yahoo.com> 
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<user@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cnauroth@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <matmsh@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size
= ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size
is selected at file write time, with a default value from system configuration used if not
specified.  Did you "hadoop fs -put" the file with the new block size, or was it something
else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <matmsh@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml,
I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks
remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>
Mime
View raw message