hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sid123 <itis...@gmail.com>
Subject Re: Iterative feedback in map reduce....
Date Fri, 27 Mar 2009 23:39:19 GMT

Thanks for the help Peter... Looks like the mapper is writing out to a common
key and adding all the values to the HDFS The mapper(s) will just serialize
over one another to write to the disc... I will be making the code for this
tonight... So can you answer a tech question... Since all the values are
being grouped under a common key how many reduce processes do you think will
be spawned? i am thinking 1 which is bad....
But I was thinking of grouping the values and generating a key using a
random number generator in the collector of the mapper. The values will now
be uniformly distributed over a few keys. Say the number of keys will be
0.1% of the # of values or atleast 1, which ever is higher. So if there
20000 values 2000 odd values should be under a single key.. and 10 reducers
should spawn to do the sum in parallel...  Now I can atleast run 10 sum in
parallel rather than just 1 reducer doing the whole work... How does that
theory seem? 

Peter Skomoroch wrote:
> Check out the EM example in nltk:
> http://code.google.com/p/nltk/source/browse/trunk/nltk/nltk_contrib/hadoop/EM/runStreaming.py
> On Fri, Mar 27, 2009 at 5:19 PM, Sid123 <itissid@gmail.com> wrote:
>> HI,
>> I have to design an iterative algorithm, each iteration is a M-R cycle
>> that
>> calculates a parameter and has to feed it back to all the maps in the
>> next
>> iteration.
>> Now the reduce procedure I need to just sum eveything from the Map
>> procedure(Many similar size matrices) into a single matrix(of same size
>> as
>> each reduce ), irrespective of the key. This single matrix is the
>> parameter
>> I was taking about earlier.
>> i want to know. PS This parameter MUST BE global to  all map processes.
>> 1) How do I collect all the values into one single parameter? Do I need
>> to
>> write it to the File system or can i keep it in memory? I feel that I
>> have to write it to the HDFS somewhere...
>> --
>> View this message in context:
>> http://www.nabble.com/Iterative-feedback-in-map-reduce....-tp22748317p22748317.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> -- 
> Peter N. Skomoroch
> 617.285.8348
> http://www.datawrangling.com
> http://delicious.com/pskomoroch
> http://twitter.com/peteskomoroch

View this message in context: http://www.nabble.com/Iterative-feedback-in-map-reduce....-tp22748317p22751900.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message