hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romedius Weiss <Romedius.We...@student.uibk.ac.at>
Subject Re: How to lower the total number of map tasks
Date Wed, 03 Oct 2012 04:00:39 GMT
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the   
mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size,  
so the algorithm tries to slice the file in chunks of a certian  
minimal size.

Have you tried to make a custom InputFormat? Might be another more  
drastic solution.

Cheers, R


Zitat von Shing Hing Man <matmsh@yahoo.com>:

> I only have one big input file.
>
> Shing
>
>
> ________________________________
>  From: Bejoy KS <bejoy.hadoop@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <matmsh@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Hi Shing
>
> Is your input a single file or set of small files? If latter you  
> need to use CombineFileInputFormat.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ________________________________
>
> From:  Shing Hing Man <matmsh@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<user@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
>
>
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
>
> and setting mapred.max.split.size in mapred-site.xml. (  
> dfs.block.size is left unchanged at 67108864).
>
> But in the job.xml, I am still getting mapred.map.tasks =242 .
>
> Shing
>
>
>
>
>
>
> ________________________________
>  From: Bejoy Ks <bejoy.hadoop@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <matmsh@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Sorry for the typo, the property name is mapred.max.split.size
>
> Also just for changing the number of map tasks you don't need to  
> modify the hdfs block size.
>
>
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:
>
> Hi
>>
>>
>> You need to alter the value of mapred.max.split size to a value  
>> larger than your block size to have less number of map tasks than  
>> the default.
>>
>>
>>
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <matmsh@yahoo.com> wrote:
>>
>>
>>>
>>>
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about  
>>> 16 GB, in job.xml, I have the following
>>>
>>>
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>>
>>>
>>> I would like to reduce   mapred.map.tasks to see if it improves  
>>> performance.
>>> I have tried doubling  the size of  dfs.block.size. But  
>>> the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>>
>>>
>>> Thanks in advance for any assistance !  
>>> Shing
>>>
>>>
>>



Mime
View raw message