hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arko Provo Mukherjee <arkoprovomukher...@gmail.com>
Subject Sharing data in a mapper for all values
Date Mon, 31 Oct 2011 23:45:10 GMT

I have a situation where I am reading a big file from HDFS and then
comparing all the data in that file with each input to the mapper.

Now since my mapper is trying to read the entire HDFS file for each of its
input, the amount of data it is having to read and keep in memory is
becoming large (file size * no of inputs to the mapper)

Can we someone avoid this by loading the file once for each mapper such
that the mapper can reuse the loaded file for each of the inputs that it

If this can be done, then for each mapper, I can just load the file once
and then the mapper can use it for the entire slice of data that it

Thanks a lot in advance!

Warm regards

View raw message