hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Roelofs <roel...@yahoo-inc.com>
Subject Re: Factorial in Map-Reduce
Date Thu, 14 Oct 2010 23:48:53 GMT
Shi Yu <shiyu@uchicago.edu> wrote:

> I don't think so. If the problem was to find the super large prime 
> number it could be a nice Map/reduce task, but simply calculating the 
> factorial is a monotonically complexity increasing task. If it is split 
> into multiple tasks, how to shuffle the complexity ?

I can think of at least a couple of reasonable ways to do so, particularly
if every mapper knows the total number of maps upfront.  The amount of work
done by each mapper would be almost the same.  (Remember, multiplication is
commutative. ;-) )

> Some easy tasks will 
> be finished earlier and just hang there waiting for the complicated 
> ones.

Not if you split up the work intelligently.

> Later, the reducer part still need to take much burden to multiply 
> all the numbers back, which probably won't gain much efficiency through 
> the Map/Reduce fashion.

This is arguably a map-only (well, map-mostly) situation.  Consider a first-
stage job wherein each of 1000 maps produces a billion-digit result; perhaps
ten maps would suffice for a second stage, which could then have a single
reducer to handle multiplication of the final set of 100-billion-digit values.

No doubt there are better ways to divide up the work, depending on whether
speed or network bandwidth or consumption of intermediate storage space is
most critical.  I'll leave the calculations to those who care about the
result. :-)

Greg

Mime
View raw message