hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thamizhannal Paramasivam <thamizhanna...@gmail.com>
Subject Re: num of reducer
Date Fri, 17 Feb 2012 04:56:57 GMT
Thank you so much to Joey & Bejoy for your suggestions.

The Job's input path has 1300-1400 text files and each of 100-200MB.

I thought, TextInputFormat spans single mapper per file and
MultiFileInputFormat spans less number mapper(<(1300-1400)) that processes
more many input files.

Which input format do you thing would be most appropriate in my case and
why?

Looking forward to your reply.

Thanks,
Thamizh


On Thu, Feb 16, 2012 at 10:06 PM, Joey Echeverria <joey@cloudera.com> wrote:

> Is your data size 100-200MB *total*?
>
> If so, then this is the expected behavior for MultiFileInputFormat. As
> Bejoy says, you can switch to TextInputFormat to get one mapper per block
> (min one mapper per file).
>
> -Joey
>
>
> On Thu, Feb 16, 2012 at 11:03 AM, Thamizhannal Paramasivam <
> thamizhannal.p@gmail.com> wrote:
>
>> Here are the input format for mapper.
>> Input Format: MultiFileInputFormat
>> MapperOutputKey : Text
>> MapperOutputValue: CustomWritable
>>
>> I shall not be in the position to upgrade hadoop-0.19.2 for some reason.
>>
>> I have checked in number of mapper on job-tracker.
>>
>> Thanks,
>> Thamizh
>>
>>
>> On Thu, Feb 16, 2012 at 6:56 PM, Joey Echeverria <joey@cloudera.com>wrote:
>>
>>> Hi Tamil,
>>>
>>> I'd recommend upgrading to a newer release as 0.19.2 is very old. As for
>>> your question, most input formats should set the number mappers correctly.
>>> What input format are you using? Where did you see the number of tasks it
>>> assigned to the job?
>>>
>>> -Joey
>>>
>>>
>>> On Thu, Feb 16, 2012 at 1:40 AM, Thamizhannal Paramasivam <
>>> thamizhannal.p@gmail.com> wrote:
>>>
>>>> Hi All,
>>>> I am using hadoop-0.19.2 and running a Mapper only Job on cluster. It's
>>>> input path has >1000 files of 100-200MB. Since, it is Mapper only job,
I
>>>> gave number Of reducer=0. So, it is using 2 mapper to run all the input
>>>> files. If we did not state the number of mapper, would n't it pick the 1
>>>> mapper per input file? Or Does the default won't it pick a fair num of
>>>> mapper according to number input file?
>>>> Thanks,
>>>> tamil
>>>
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>>
>>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
>

Mime
View raw message