spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 顾荣 <gurongwal...@gmail.com>
Subject Re: matrix computation in spark
Date Tue, 18 Nov 2014 05:49:25 GMT
Hey Yuxi,

We also have implemented a distributed matrix multiplication library in
PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
implemented three distributed matrix multiplication algorithms on Spark. As
we see, communication-optimal does not always means the total-optimal.
Thus, besides the CARMA matrix multiplication you mentioned, we also
implemented the Block-splitting matrix multiplication and Broadcast matrix
multiplication. They are more efficient than the CARMA matrix
multiplication for some situations, for example a large matrix multiplies a
small matrix.

Actually, We have shared the work on Spark Meetup@Beijing on October 26th.(
http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/ ). The
slide can be download from the archive here
http://pan.baidu.com/s/1dDoyHX3#path=%252Fmeetup-3rd

Best,
Rong

2014-11-18 13:11 GMT+08:00 顾荣 <gurongwalker@gmail.com>:

> Hey Yuxi,
>
> We also have implemented a distributed matrix multiplication library in
> PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
> implemented three distributed matrix multiplication algorithms on Spark. As
> we see, communication-optimal does not always means the total-optimal.
> Thus, besides the CARMA matrix multiplication you mentioned, we also
> implemented the Block-splitting matrix multiplication and Broadcast matrix
> multiplication. They are more efficient than the CARMA matrix
> multiplication for some situations, for example a large matrix multiplies a
> small matrix.
>
> Actually, We have shared the work on Spark Meetup@Beijing on October
> 26th.( http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/
> ). The slide is also attached in this mail.
>
> Best,
> Rong
>
> 2014-11-18 11:36 GMT+08:00 Zongheng Yang <zongheng.y@gmail.com>:
>
>> There's been some work at the AMPLab on a distributed matrix library on
>> top
>> of Spark; see here [1]. In particular, the repo contains a couple
>> factorization algorithms.
>>
>> [1] https://github.com/amplab/ml-matrix
>>
>> Zongheng
>>
>> On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi <liaoyuxi@huawei.com> wrote:
>>
>> > Hi,
>> > Matrix computation is critical for algorithm efficiency like least
>> square,
>> > Kalman filter and so on.
>> > For now, the mllib module offers limited linear algebra on matrix,
>> > especially for distributed matrix.
>> >
>> > We have been working on establishing distributed matrix computation APIs
>> > based on data structures in MLlib.
>> > The main idea is to partition the matrix into sub-blocks, based on the
>> > strategy in the following paper.
>> > http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf
>> > In our experiment, it's communication-optimal.
>> > But operations like factorization may not be appropriate to carry out in
>> > blocks.
>> >
>> > Any suggestions and guidance are welcome.
>> >
>> > Thanks,
>> > Yuxi
>> >
>> >
>>
>
>
>
> --
> ------------------
> Rong Gu
> Department of Computer Science and Technology
> State Key Laboratory for Novel Software Technology
> Nanjing University
> Phone: +86 15850682791
> Email: gurongwalker@gmail.com
> Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
>



-- 
------------------
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwalker@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message