mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: mahout collections updates
Date Tue, 12 Mar 2013 22:42:19 GMT
Using this code in DenseVector gives me similar performance:

  @Override
  public double dot(Vector x) {
    if (!x.isDense()) {
      return super.dot(x);
    } else {
      double sum = 0;
      final int size = size();
      for (int n = 0; n < size; n++) {
        sum += values[n] * x.getQuick(n);
      }
      return sum;
    }
  }

On 12.03.2013 23:18, Jake Mannix wrote:
> Ok now that's even weirder than I thought!  You're still calling vector
> methods, you're not doing direct array access or anything...
> 
> On Tuesday, March 12, 2013, Sebastian Schelter wrote:
> 
>> I wrote a small benchmark as we investigated this issue recently for
>> computing recommendations from an ALS factorization.
>>
>> https://gist.github.com/sscdotopen/5147521
>>
>> Here's some results on my notebook, indicating a 5x to 6x performance
>> improvement:
>>
>> dot() 570ms, direct 98ms
>> dot() 568ms, direct 97ms
>> dot() 559ms, direct 97ms
>> dot() 566ms, direct 98ms
>> dot() 574ms, direct 98ms
>> dot() 582ms, direct 101ms
>> dot() 581ms, direct 98ms
>> dot() 576ms, direct 98ms
>> dot() 574ms, direct 97ms
>> dot() 574ms, direct 99ms
>> dot() 564ms, direct 98ms
>> dot() 576ms, direct 97ms
>> dot() 580ms, direct 100ms
>>
>>
>>
>> On 12.03.2013 23:00, Sean Owen wrote:
>>> It's almost certainly the overhead of the iterator creation and iterator
>>> methods.
>>> DenseVector.dot() is not specialized and the simple dot product method
>> here
>>> could as well be placed there. Then the call to DenseVector.dot() would
>> be
>>> equally unsurprisingly fast.
>>>
>>>
>>> On Tue, Mar 12, 2013 at 9:56 PM, Jake Mannix <jake.mannix@gmail.com<javascript:;>>
>> wrote:
>>>
>>>> But then where does it slow down?  It just wraps a double[]
>>>>
>>>> On Tuesday, March 12, 2013, Sebastian Schelter wrote:
>>>>
>>>>> I looked into DenseVector and it doesn't use any primitive collections,
>>>>> so ignore my last mail :)
>>>>>
>>>>> On 12.03.2013 22:16, Sebastian Schelter wrote:
>>>>>> As a sidenote: I was kinda shocked recently, that switching from
>>>>>> DenseVector's dot() method to a direct dot product computation gave
a
>>>> 3x
>>>>>> increase in performance in
>>>>>> org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.
>>>>>>
>>>>>> It seems like we really have a performance problem for some usecases.
>>>>>>
>>>>>> On 12.03.2013 22:04, Dawid Weiss wrote:
>>>>>>>> The primary use case for mahout collections is directly *inside*
of
>>>>>>>> our Vector interface.  Which is to say, it's not directly
exposed to
>>>>>>>> most users, and we don't really expose the ability to do
guava
>>>>> collections
>>>>>>>> stuff on them at all: We Do Math. :)  So in particular, we
don't
>>>> expose
>>>>>>>
>>>>>>> Fair enough. But you might want to expose some of it at some
point
>> and
>>>>>>> if this happens it
>>>>>>> may just be ready for you.
>>>>>>>
>>>>>>>> Question is whether there's anything to be gained by just
swapping
>>>>>>>> our own collections *out* for something else, like HPPC or
fastutil.
>>>>>>>
>>>>>>> Depends. Speed optimizations may be one reason -- you'd need
to check
>>>>>>> if the code gains anything by using these libraries compared
to
>> Mahout
>>>>>>> collections. While microbenchmarks may show large differences
my bet
>>>>>>> is that overall results, taking into account
>>>>>>> computations and, God forbid, I/O, will be within noise range
unless
>>>>>>> you're really using these data structures a *lot* in tight loops.
The
>>>>>>> only practical benefit I see is getting rid of a chunk of code
you
>>>>>>> don't wish to
>>>>>>> maintain (like you said: missing features, unit tests, etc.).
But I
>>>>>>> don't negate there is some entertainment value in going back
to such
>>>>>>> fundamental data structures and trying to squeeze the last bit
of
>>>>>>> performance out of them. :)
>>>>>>>
>>>>>>> Dawid
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>>   -jake
>>>>
>>>
>>
>>
> 


Mime
View raw message