hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Sharing data in a mapper for all values
Date Tue, 01 Nov 2011 03:34:12 GMT

Have you considered using Hive/Pig for the same kind of functionality instead?

There are also ways to use reducers for this with proper group/sort
comparators in place (need more understanding of what you're trying to
achieve here before we can give out a solution), but you can use the
above tools instead - they may offer a more 'natural' way out.

On Tue, Nov 1, 2011 at 5:15 AM, Arko Provo Mukherjee
<arkoprovomukherjee@gmail.com> wrote:
> Hello,
> I have a situation where I am reading a big file from HDFS and then
> comparing all the data in that file with each input to the mapper.
> Now since my mapper is trying to read the entire HDFS file for each of its
> input, the amount of data it is having to read and keep in memory is
> becoming large (file size * no of inputs to the mapper)
> Can we someone avoid this by loading the file once for each mapper such that
> the mapper can reuse the loaded file for each of the inputs that it
> receives.
> If this can be done, then for each mapper, I can just load the file once and
> then the mapper can use it for the entire slice of data that it receives.
> Thanks a lot in advance!
> Warm regards
> Arko

Harsh J

View raw message