hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shi Yu <sh...@uchicago.edu>
Subject Re: Factorial in Map-Reduce
Date Fri, 15 Oct 2010 00:17:59 GMT
For the simplest case, the natural number factorial, the arrangements of 
job maybe is obvious, but not absolutely. Because a new variable "type" 
need to be created to store the super large numbers (like 
http://gmplib.org/), and then a package need to be developed to support 
map/reduce.

There are other factorials, like the float number factorial, the hyper 
factorial, the complex number factorial, the exponential factorial, and 
array number factorial, which makes investigating the distribution of 
candidate numbers / complexity estimation of subtasks   as much 
complicated as the multiplication itself.

If there is a single reducer to do the final job, then by theory that 
reducer should be capable to do the whole job. The overall memory 
complexity is bounded by the last multiplication.

Fortunately, I am not interested in this problem and don't even need to 
think more about it.

But there are many other factorial,
On 2010-10-14 18:48, Greg Roelofs wrote:
> Shi Yu<shiyu@uchicago.edu>  wrote:
>
>    
>> I don't think so. If the problem was to find the super large prime
>> number it could be a nice Map/reduce task, but simply calculating the
>> factorial is a monotonically complexity increasing task. If it is split
>> into multiple tasks, how to shuffle the complexity ?
>>      
> I can think of at least a couple of reasonable ways to do so, particularly
> if every mapper knows the total number of maps upfront.  The amount of work
> done by each mapper would be almost the same.  (Remember, multiplication is
> commutative. ;-) )
>
>    
>> Some easy tasks will
>> be finished earlier and just hang there waiting for the complicated
>> ones.
>>      
> Not if you split up the work intelligently.
>
>    
>> Later, the reducer part still need to take much burden to multiply
>> all the numbers back, which probably won't gain much efficiency through
>> the Map/Reduce fashion.
>>      
> This is arguably a map-only (well, map-mostly) situation.  Consider a first-
> stage job wherein each of 1000 maps produces a billion-digit result; perhaps
> ten maps would suffice for a second stage, which could then have a single
> reducer to handle multiplication of the final set of 100-billion-digit values.
>
> No doubt there are better ways to divide up the work, depending on whether
> speed or network bandwidth or consumption of intermediate storage space is
> most critical.  I'll leave the calculations to those who care about the
> result. :-)
>
> Greg
>    


Mime
View raw message