mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1640) Better collections would significantly improve vector-operation speed
Date Tue, 08 Mar 2016 11:42:41 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184814#comment-15184814
] 

ASF GitHub Bot commented on MAHOUT-1640:
----------------------------------------

Github user smarthi commented on the pull request:

    https://github.com/apache/mahout/pull/81#issuecomment-193745406
  
    Let's merge this first, please create another PR for sparsematrix. Thanks again for this.
    
    Sent from my iPhone
    
    > On Mar 8, 2016, at 3:50 AM, Sebastiano Vigna <notifications@github.com> wrote:
    > 
    > BTW, I noticed that SparseMatrix can undergo the same improvement with near-to-zero
effort. Do you want me to pack everything into this pull request or do you prefer to merge
this part and then I'll work on SparseMatrix?
    > 
    > —
    > Reply to this email directly or view it on GitHub.
    > 



> Better collections would significantly improve vector-operation speed
> ---------------------------------------------------------------------
>
>                 Key: MAHOUT-1640
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1640
>             Project: Mahout
>          Issue Type: Improvement
>          Components: collections
>         Environment: Darwin lithium.local 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec
22 23:10:38 PST 2014; root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64 i386 MacBookPro10,1 Darwin
> java version "1.8.0_31"
> Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
>            Reporter: Sebastiano Vigna
>            Assignee: Suneel Marthi
>              Labels: legacy, math, scala
>         Attachments: fastutil.patch, speed-fastutil, speed-std
>
>
> The collections currently used by Mahout to implement sparse vectors are extremely slow.
The proposed patch (localized to RandomAccessSparseVector) uses fastutil's maps and the speed
improvements in vector benchmarks are very significant. It would be interesting to see whether
these improvements percolate to high-level classes using sparse vectors.
> I had to patch two unit tests (an off-by-one bug and an overfitting bug; both were exposed
by the different order in which key/values were returned by iterators).
> The included files speed-std and speed-fastutil show the speed improvement. Some more
speed might be gained by using everywhere the standard java.util.Map.Entry interface instead
of Element.
> DISCLAIMER: The "Times" set of tests has been run multiplying two identical vectors.
The standard tests multiply two random vectors, so in fact they just test the speed of the
underlying map remove() method, as almost all products are zero. This is not very realistic
and was heavily penalizing fastutil's "true deletions". Better tests, with a typical overlap
of nonzero entries, would be even more realistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message