hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: WholeFileInputFormat format
Date Tue, 10 Jul 2012 14:16:52 GMT
Hello Harsh,

         Thank you so much for the quick response. Actually I have a
use case wherein I have to compare values that are coming from 2
mappers to one reducer. For that I am planning to use MultipleInputs
class. In one mapper I have a text file (these files may contain
1,00,000 to 2,00,000 lines), and I have to extract bytes from 2-13,
20-25, 32-38 and so on from each line of this file. In the second
mapper I have to read values from an Hbase table. The columns of this
table correspond to the fields which I am reading from the text file
in the first mapper.
        In the reducer I have to compare the results coming for both
the mappers and generate the final result. Need your guidance. Many
thanks.

Regards,
    Mohammad Tariq


On Tue, Jul 10, 2012 at 6:55 PM, Harsh J <harsh@cloudera.com> wrote:
> It depends on what you need. If your file is not splittable, or if you
> need to read the whole file from a single mapper itself (i.e. you do
> not _want_ it to be split), then use WholeFileInputFormats. Otherwise,
> you get more parallelism with regular splitting.
>
> On Tue, Jul 10, 2012 at 6:31 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
>> Hello list,
>>
>>        What could be the approximate maximum size of the files that
>> can be handled using WholeFileInputFormat format??I mean, if the file
>> is very big, then is it feasible to use WholeFileInputFormat as the
>> entire load will go to one mapper??Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>
>
>
> --
> Harsh J

Mime
View raw message