mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Performance of ALS
Date Thu, 18 Apr 2013 20:56:52 GMT
I'm always interested in optimizing the bit where you solve Ax=B which
I so recently went on about. That's a dense-matrix problem. Is there a
QR decomposition available?

I tried getting this part to run on a GPU, and it worked, but wasn't
faster. Still somehow it was slower to push the smalish dense matrix
onto the card so many times per second. Same issue is identified here
so I'm interested to hear if this is a win by using the direct buffer
approach.

On Thu, Apr 18, 2013 at 9:51 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> i've looked at jblas some time year or two ago.
>
> It's a fast bridge to LAPack and LAPack by far is hard to beat. But, I
> think i convinced myself it lacks support for sparse stuff. Which will work
> nice though still for many blockified algorithms such as ALS-WR with try to
> avoid doing blas level 3 operations on sparse data.
>
>
> On Thu, Apr 18, 2013 at 1:45 PM, Robin Anil <robin.anil@gmail.com> wrote:
>
>> BTW did this include the changes I made in the trunk recently? I would also
>> like to profile that code and see if we can squeeze out our Vectors and
>> Matrices more. Could you point me to how I can run the 1M example.
>>
>> Robin
>>
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>
>>
>> On Thu, Apr 18, 2013 at 3:43 PM, Robin Anil <robin.anil@gmail.com> wrote:
>>
>> > I was just emailing something similar on Mahout(See my email). I saw the
>> > TU Berlin name and I thought you would do something about it :) This is
>> > excellent. One of the next gen work on Vectors is maybe investigating
>> this.
>> >
>> >
>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>> >
>> >
>> > On Thu, Apr 18, 2013 at 3:37 PM, Sebastian Schelter <ssc@apache.org
>> >wrote:
>> >
>> >> Hi there,
>> >>
>> >> with regard to Robin mentioning JBlas [1] recently when we talked about
>> >> the performance of our vector operations, I ported the solving code for
>> >> ALS to JBlas today and got some awesome results.
>> >>
>> >> For the movielens 1M dataset and a factorization of rank 100, the
>> >> runtimes per iteration dropped from 50 seconds to less than 7 seconds. I
>> >> will run some tests with the distributed version and larger datasets in
>> >> the next days, but from what I've seen we should really take a closer
>> >> look at JBlas, at least for operations on dense matrices.
>> >>
>> >> Best,
>> >> Sebastian
>> >>
>> >> [1] http://mikiobraun.github.io/jblas/
>> >>
>> >
>> >
>>

Mime
View raw message