mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vimal Mathew <vml.mat...@gmail.com>
Subject Re: Current state of (dense) matrix multiplication?
Date Mon, 12 Apr 2010 11:15:47 GMT
The naive matrix-multiplication algorithm is highly parallelizable if
you have the data available locally at all the nodes. The persistent
storage issue was one of the first problems that I tried solving (HDFS
is just wrong for the access patterns in matrix algorithms).

I cant compete with Matlab yet! But I am planning to add support for
SSE2 instructions, so I might get close. Also I dont have systems with
64G RAM, or 14 cores at one place :(
I hope to get much better results in a month or two.


On Mon, Apr 12, 2010 at 12:27 AM, Steven Buss <steven.buss@gmail.com> wrote:
> If you're just doing matrix multiplication, I would advise that mahout
> (or any mapreduce approach) isn't well suited to your problem. I did
> the same computation with matlab (multiplying two 40k x 40k random
> double precision dense matrices) using 14 cores and about 36GB of ram
> on a single machine* and it finished in about 55 minutes. If I'm
> reading your email correctly, you were working with 34*2*4=272 cores!
> I'm not sure if dense matrix multiplication can actually be
> efficiently mapreduced, but I am still a rookie so don't take my word
> for it.
>
> *The machine I am working on has 8 dual core AMD opteron 875s @ 2.2GHz
> per core, with 64GB total system memory.
>
> Steven Buss
> steven.buss@gmail.com
> http://www.stevenbuss.com/
>
>
>
> On Sun, Apr 11, 2010 at 11:53 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>> Vimal,
>>
>> We don't have any distributed dense multiplication operations because we
>> have not yet found much application demand for distributed dense matrix
>> multiplication.  Distributed sparse matrix operations are a big deal,
>> however.
>>
>> If you are interested in working on the problem in the context of Mahout, we
>> would love to help.  This is especially true if you have an application that
>> needs dense operations and could benefit from some of the other capabilities
>> in Mahout.
>>
>> On Sun, Apr 11, 2010 at 1:27 PM, Vimal Mathew <vml.mathew@gmail.com> wrote:
>>
>>> Hi,
>>>  What's the current state of matrix-matrix multiplication in Mahout?
>>> Are there any performance results available for large matrices?
>>>
>>>  I have been working on a Hadoop-compatible distributed storage for
>>> matrices. I can currently multiply two 40K x 40K dense double
>>> precision matrices in around 1 hour using 34 systems (16GB RAM, two
>>> Core2Quads' per node). I was wondering how this compares with Mahout.
>>>
>>> Regards,
>>>  Vimal
>>>
>>
>

Mime
View raw message