hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CubicDesign <cubicdes...@gmail.com>
Subject Re: Processing 10MB files in Hadoop
Date Thu, 26 Nov 2009 15:58:56 GMT
But the documentation DO recommend to set it: 
http://wiki.apache.org/hadoop/HowManyMapsAndReduces



PS: I am using streaming
 


Jeff Zhang wrote:
> Actually, you do not need to set the number of map task, the InputFormat
> will compute it for you according your input data set.
>
> Jeff Zhang
>
>
> On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign <cubicdesign@gmail.com> wrote:
>
>   
>>  The number of mapper is determined by your InputFormat.
>>     
>>> In common case, if file is smaller than one block size (which is 64M by
>>> default), one mapper for this file. if file is larger than one block size,
>>> hadoop will split this large file, and the number of mapper for this file
>>> will be ceiling ( (size of file)/(size of block) )
>>>
>>>
>>>
>>>       
>> Hi
>>
>> Do you mean, I should set the number of map tasks to 1 ????
>> I want to process this file not in a single node but over the entire
>> cluster. I need a lot of processing power in order to finish the job in
>> hours instead of days.
>>
>>     
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message