For the simplest case, the natural number factorial, the arrangements of
job maybe is obvious, but not absolutely. Because a new variable "type"
need to be created to store the super large numbers (like
http://gmplib.org/), and then a package need to be developed to support
map/reduce.
There are other factorials, like the float number factorial, the hyper
factorial, the complex number factorial, the exponential factorial, and
array number factorial, which makes investigating the distribution of
candidate numbers / complexity estimation of subtasks as much
complicated as the multiplication itself.
If there is a single reducer to do the final job, then by theory that
reducer should be capable to do the whole job. The overall memory
complexity is bounded by the last multiplication.
Fortunately, I am not interested in this problem and don't even need to
think more about it.
But there are many other factorial,
On 20101014 18:48, Greg Roelofs wrote:
> Shi Yu<shiyu@uchicago.edu> wrote:
>
>
>> I don't think so. If the problem was to find the super large prime
>> number it could be a nice Map/reduce task, but simply calculating the
>> factorial is a monotonically complexity increasing task. If it is split
>> into multiple tasks, how to shuffle the complexity ?
>>
> I can think of at least a couple of reasonable ways to do so, particularly
> if every mapper knows the total number of maps upfront. The amount of work
> done by each mapper would be almost the same. (Remember, multiplication is
> commutative. ;) )
>
>
>> Some easy tasks will
>> be finished earlier and just hang there waiting for the complicated
>> ones.
>>
> Not if you split up the work intelligently.
>
>
>> Later, the reducer part still need to take much burden to multiply
>> all the numbers back, which probably won't gain much efficiency through
>> the Map/Reduce fashion.
>>
> This is arguably a maponly (well, mapmostly) situation. Consider a first
> stage job wherein each of 1000 maps produces a billiondigit result; perhaps
> ten maps would suffice for a second stage, which could then have a single
> reducer to handle multiplication of the final set of 100billiondigit values.
>
> No doubt there are better ways to divide up the work, depending on whether
> speed or network bandwidth or consumption of intermediate storage space is
> most critical. I'll leave the calculations to those who care about the
> result. :)
>
> Greg
>
