hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: WholeFileInputFormat format
Date Tue, 10 Jul 2012 15:04:03 GMT
I don't see why you'd have to use the WholeFileInputFormat for such a
task. Your task is very similar to joins, and you can see the section
"General reducer-side join" for what your overall logic should look
like, under Ricky's

On Tue, Jul 10, 2012 at 7:46 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
> Hello Harsh,
>          Thank you so much for the quick response. Actually I have a
> use case wherein I have to compare values that are coming from 2
> mappers to one reducer. For that I am planning to use MultipleInputs
> class. In one mapper I have a text file (these files may contain
> 1,00,000 to 2,00,000 lines), and I have to extract bytes from 2-13,
> 20-25, 32-38 and so on from each line of this file. In the second
> mapper I have to read values from an Hbase table. The columns of this
> table correspond to the fields which I am reading from the text file
> in the first mapper.
>         In the reducer I have to compare the results coming for both
> the mappers and generate the final result. Need your guidance. Many
> thanks.
> Regards,
>     Mohammad Tariq
> On Tue, Jul 10, 2012 at 6:55 PM, Harsh J <harsh@cloudera.com> wrote:
>> It depends on what you need. If your file is not splittable, or if you
>> need to read the whole file from a single mapper itself (i.e. you do
>> not _want_ it to be split), then use WholeFileInputFormats. Otherwise,
>> you get more parallelism with regular splitting.
>> On Tue, Jul 10, 2012 at 6:31 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
>>> Hello list,
>>>        What could be the approximate maximum size of the files that
>>> can be handled using WholeFileInputFormat format??I mean, if the file
>>> is very big, then is it feasible to use WholeFileInputFormat as the
>>> entire load will go to one mapper??Many thanks.
>>> Regards,
>>>     Mohammad Tariq
>> --
>> Harsh J

Harsh J

View raw message