hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panayotis Antonopoulos <antonopoulos...@hotmail.com>
Subject Reducer that processes key/value pairs depending on the key
Date Mon, 25 Apr 2011 01:13:49 GMT

I am a beginner in MapReduce and I am trying to create a forward and an inverted index for
a large number of documents.
I believe that parsing each document twice (once for the forward index and once for the inverted
index) would be inefficient,
so I would like to ask you if it would be a good solution to get the mapper (which parses
each document) emit key/value pairs of different kind:
(doc_id,word) and (word,doc_id) for every word of every document. The first is useful for
the forward index and the second one for the inverted.

The reducer would check the key and according to it it would write (doc_id,list_of_words)
or (word,list_of_docs) to the appropriate file (forward or inverted index) using MulitipleOutputs.

I would like to ask you if something like this is a good choice on a general basis and only
for the specific problem that I mention.

Thank you in advance,
View raw message