hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aayush Garg <aayush.g...@gmail.com>
Subject Re: Modeling WordCount in a different way
Date Tue, 07 Apr 2009 14:41:28 GMT
I want to investigate whether hadoop could handle streams.
Like data is coming as a infinite stream and hadoop is used to perform
online aggregation.
Hadoop comes with fault tolerance and other nice features so these are
directly used in such scenario.

On Tue, Apr 7, 2009 at 4:28 PM, Norbert Burger <norbert.burger@gmail.com>wrote:

> Aayush, out of curiosity, why do you want model wordcount this way?
> What benefit do you see?
>
> Norbert
>
> On 4/6/09, Aayush Garg <aayush.garg@gmail.com> wrote:
> > Hi,
> >
> >  I want to make experiments with wordcount example in a different way.
> >
> >  Suppose we have very large data. Instead of splitting all the data one
> time,
> >  we want to feed some splits in the map-reduce job at a time. I want to
> model
> >  the hadoop job like this,
> >
> >  Suppose a batch of inputsplits arrive in the beginning to every map, and
> >  reduce gives the word, frequency for this batch of inputsplits.
> >  Now after this another batch of inputsplits arrive and the results from
> >  subsequent reduce are aggregated to the previous results(if the word
> "that"
> >  has frequency 2 in previous processing and in this processing it occurs
> 1
> >  time, then the frequency of "that" is now maintained as 3).
> >  In next map-reduce "that" comes 4 times, now its frequency maintained as
> >  7....
> >
> >  And this process goes on like this.
> >  Now how would I model inputsplits like this and how these continuous
> >  map-reduces can be made running. In what way should I keep the results
> of
> >  Map-Reduces so that I could aggregate this with the output of next
> >  Map-reduce.
> >
> >  Thanks,
> >
> > Aayush
> >
>



-- 
Aayush Garg

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message