hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <ankur.g...@corp.aol.com>
Subject RE: MultiFileInputFormat - Not enough mappers
Date Fri, 11 Jul 2008 13:56:59 GMT
In this case I have to compute the number of map tasks in the
application - (totalSize / blockSize), which is what I am doing as a
I think this should be the default behaviour in MultiFileInputFormat.
Should a JIRA be opened for the same ?


-----Original Message-----
From: Enis Soztutar [mailto:enis.soz.nutch@gmail.com] 
Sent: Friday, July 11, 2008 7:21 PM
To: core-user@hadoop.apache.org
Subject: Re: MultiFileInputFormat - Not enough mappers

MultiFileSplit currently does not support automatic map task count
computation. You can manually set the number of maps via
jobConf#setNumMapTasks() or via command line arg -D

Goel, Ankur wrote:
> Hi Folks,
>               I am using hadoop to process some temporal data which is

> split in lot of small files (~ 3 - 4 MB) Using TextInputFormat 
> resulted in too many mappers (1 per file) creating a lot of overhead 
> so I switched to MultiFileInputFormat - 
> (MutiFileWordCount.MyInputFormat) which resulted in just 1 mapper.
> I was hoping to set the no of mappers to 1 so that hadoop 
> automatically takes care of generating the right number of map tasks.
> Looks like when using MultiFileInputFormat one has to rely on the 
> application to specify the right number of mappers or am I missing 
> something ? Please advise.
> Thanks
> -Ankur

View raw message