hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aayush Garg <aayush.g...@gmail.com>
Subject Modeling WordCount in a different way
Date Tue, 07 Apr 2009 00:35:34 GMT

I want to make experiments with wordcount example in a different way.

Suppose we have very large data. Instead of splitting all the data one time,
we want to feed some splits in the map-reduce job at a time. I want to model
the hadoop job like this,

Suppose a batch of inputsplits arrive in the beginning to every map, and
reduce gives the word, frequency for this batch of inputsplits.
Now after this another batch of inputsplits arrive and the results from
subsequent reduce are aggregated to the previous results(if the word "that"
has frequency 2 in previous processing and in this processing it occurs 1
time, then the frequency of "that" is now maintained as 3).
In next map-reduce "that" comes 4 times, now its frequency maintained as

And this process goes on like this.
Now how would I model inputsplits like this and how these continuous
map-reduces can be made running. In what way should I keep the results of
Map-Reduces so that I could aggregate this with the output of next


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message