hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aayush Garg <aayush.g...@gmail.com>
Subject Re: Modeling WordCount in a different way
Date Tue, 07 Apr 2009 12:35:08 GMT
I have confusion how would I start the next job after finishing the one,
could you just make it clear by some rough example. Also do I need to use
SequenceFileInputFormat to maintain the results in the memory and then
accessing it.

On Tue, Apr 7, 2009 at 10:43 AM, Sharad Agarwal <sharadag@yahoo-inc.com>wrote:

> > Suppose a batch of inputsplits arrive in the beginning to every map, and
> > reduce gives the word, frequency for this batch of inputsplits.
> > Now after this another batch of inputsplits arrive and the results from
> > subsequent reduce are aggregated to the previous results(if the word
> "that"
> > has frequency 2 in previous processing and in this processing it occurs 1
> > time, then the frequency of "that" is now maintained as 3).
> > In next map-reduce "that" comes 4 times, now its frequency maintained as
> > 7....
> >
> you could merge the result from the previous step in the reducer. If the no
> of unique words are not large,  the output from the previous step can be
> loaded in the memory hash. This can be used to add the count from previous
> step to the current step.
> In case you expect the unique words list to be large to fit in memory. You
> could read the previous step output directly from the hdfs and since it
> would be a sorted file you could just walk it and merge the count in single
> pass in the reduce function.
> - Sharad

Aayush Garg,
Phone: +41 764822440

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message