The naive matrix-multiplication algorithm is highly parallelizable if
you have the data available locally at all the nodes. The persistent
storage issue was one of the first problems that I tried solving (HDFS
is just wrong for the access patterns in matrix algorithms).
I cant compete with Matlab yet! But I am planning to add support for
SSE2 instructions, so I might get close. Also I dont have systems with
64G RAM, or 14 cores at one place :(
I hope to get much better results in a month or two.
On Mon, Apr 12, 2010 at 12:27 AM, Steven Buss wrote:
> If you're just doing matrix multiplication, I would advise that mahout
> (or any mapreduce approach) isn't well suited to your problem. I did
> the same computation with matlab (multiplying two 40k x 40k random
> double precision dense matrices) using 14 cores and about 36GB of ram
> on a single machine* and it finished in about 55 minutes. If I'm
> reading your email correctly, you were working with 34*2*4=272 cores!
> I'm not sure if dense matrix multiplication can actually be
> efficiently mapreduced, but I am still a rookie so don't take my word
> for it.
> *The machine I am working on has 8 dual core AMD opteron 875s @ 2.2GHz
> per core, with 64GB total system memory.
>
> On Sun, Apr 11, 2010 at 11:53 PM, Ted Dunning wrote:
>> Vimal,
>>
>> We don't have any distributed dense multiplication operations because we
>> have not yet found much application demand for distributed dense matrix
>> multiplication. Distributed sparse matrix operations are a big deal,
>> however.
>>
>> If you are interested in working on the problem in the context of Mahout, we
>> would love to help. This is especially true if you have an application that
>> needs dense operations and could benefit from some of the other capabilities
>> in Mahout.
>>
>> On Sun, Apr 11, 2010 at 1:27 PM, Vimal Mathew wrote:
>>
>>> Hi,
>>> What's the current state of matrix-matrix multiplication in Mahout?
>>> Are there any performance results available for large matrices?
>>> I have been working on a Hadoop-compatible distributed storage for
>>> matrices. I can currently multiply two 40K x 40K dense double
>>> precision matrices in around 1 hour using 34 systems (16GB RAM, two
>>> Core2Quads' per node). I was wondering how this compares with Mahout.
>>> Regards,
>>> Vimal
