hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregory Lawrence <gr...@yahoo-inc.com>
Subject Re: How read index and data file?
Date Thu, 14 Oct 2010 17:21:35 GMT

I'm not sure I fully understand your question but if you are asking how to read in an index
file in addition to the standard job input, you should look into writing your own setup function.
It may look something like the following:

public void setup(Context context) throws IOException, InterruptedException {
     Configuration conf = context.getConfiguration();

     Path path = new Path(fileName);
     FileSystem fs = path.getFileSystem(conf);
     BufferedReader reader = new BufferedReader(new InputStreamReader(fs.open(path)));

The setup function should also initialize any necessary data structures (e.g., hash tables).
This, of course, assumes that your index file is small enough to fit in memory. You should
also look into using the distributed cache option, as it should speed things up, especially
when multiple Mapper/Reducer tasks run in sequence on the same machine.

Greg Lawrence

On 10/13/10 12:00 PM, "Pedro Costa" <psdc1978@gmail.com> wrote:


I would like to create an example to read an index file and the data
file that is produced as output in the map function. Can anyone give
me an example, please?


View raw message