hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: Sharing data in a mapper for all values
Date Mon, 31 Oct 2011 23:52:50 GMT
Yes, you can read the file in the configure() (old api) and setup()
(new api) methods. The data can be saved in a variable that will be
accessible to every call to map().


On Mon, Oct 31, 2011 at 7:45 PM, Arko Provo Mukherjee
<arkoprovomukherjee@gmail.com> wrote:
> Hello,
> I have a situation where I am reading a big file from HDFS and then
> comparing all the data in that file with each input to the mapper.
> Now since my mapper is trying to read the entire HDFS file for each of its
> input, the amount of data it is having to read and keep in memory is
> becoming large (file size * no of inputs to the mapper)
> Can we someone avoid this by loading the file once for each mapper such that
> the mapper can reuse the loaded file for each of the inputs that it
> receives.
> If this can be done, then for each mapper, I can just load the file once and
> then the mapper can use it for the entire slice of data that it receives.
> Thanks a lot in advance!
> Warm regards
> Arko

Joseph Echeverria
Cloudera, Inc.

View raw message