hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghava Mutharaju <m.vijayaragh...@gmail.com>
Subject is this architecture possible?
Date Sat, 13 Feb 2010 12:07:42 GMT
Hello all,

      Is the following architecture possible?

A distributed key-value store is used (HBase). So along with values, there
would be a timestamp associated with the values. Map & Reduce tasks are
executed iteratively. Map, in each iteration should take in values which
were added in the previous iteration to the store (perhaps the ones with
latest timestamp?). Reduce should take in Map's output as well as the
<key,value> pairs from the store whose key(s) match the key(s) that reduce
has to process in the current iteration. The output of reduce goes to the

If this is possible, which classes (eg: InputFormat, run() of Reduce) should
be extended so that instead of the regular operation the above operation
takes place. If this is not possible, are there any alternatives to achieve
the same?

Thank you.


View raw message