hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: num of reducer
Date Thu, 16 Feb 2012 16:36:16 GMT
Is your data size 100-200MB *total*?

If so, then this is the expected behavior for MultiFileInputFormat. As
Bejoy says, you can switch to TextInputFormat to get one mapper per block
(min one mapper per file).

-Joey

On Thu, Feb 16, 2012 at 11:03 AM, Thamizhannal Paramasivam <
thamizhannal.p@gmail.com> wrote:

> Here are the input format for mapper.
> Input Format: MultiFileInputFormat
> MapperOutputKey : Text
> MapperOutputValue: CustomWritable
>
> I shall not be in the position to upgrade hadoop-0.19.2 for some reason.
>
> I have checked in number of mapper on job-tracker.
>
> Thanks,
> Thamizh
>
>
> On Thu, Feb 16, 2012 at 6:56 PM, Joey Echeverria <joey@cloudera.com>wrote:
>
>> Hi Tamil,
>>
>> I'd recommend upgrading to a newer release as 0.19.2 is very old. As for
>> your question, most input formats should set the number mappers correctly.
>> What input format are you using? Where did you see the number of tasks it
>> assigned to the job?
>>
>> -Joey
>>
>>
>> On Thu, Feb 16, 2012 at 1:40 AM, Thamizhannal Paramasivam <
>> thamizhannal.p@gmail.com> wrote:
>>
>>> Hi All,
>>> I am using hadoop-0.19.2 and running a Mapper only Job on cluster. It's
>>> input path has >1000 files of 100-200MB. Since, it is Mapper only job, I
>>> gave number Of reducer=0. So, it is using 2 mapper to run all the input
>>> files. If we did not state the number of mapper, would n't it pick the 1
>>> mapper per input file? Or Does the default won't it pick a fair num of
>>> mapper according to number input file?
>>> Thanks,
>>> tamil
>>
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>>
>


-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message