mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suneel Marthi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAHOUT-1780) Multi-threaded Matrix Multiplication is slower than Single-thread variant
Date Sun, 25 Oct 2015 11:37:27 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Suneel Marthi updated MAHOUT-1780:
----------------------------------
    Description: 
Capturing here the Conversation on this subject:

{code}

Turns out that matrix view traversal (of dense matrices, anyway) is 4 times slower than regular
matrix traversal in the same direction. I.e.

Ad %*% Bd: (106.33333333333333,85.0)
Ad(r,::) %*% Bd: (356.0,328.0)

where r=0 until Ad.nrow.

Investigated MatrixView, it reports correct matrix flavor (as the owner's) and correct algorithm
is selected (the same as for the row above). Sure, MatrixView gives an indirection(sometimes
even double indirection) but 4x?? It should not be that much different from transpose view
overhead, and transpose view overhead is very small in the tests (compared to the rest of
the cost)

The main difference seems to be that the algorithm over matrices ends up doing a dot over
DenseVector and a DenseVector (even that the wrapper object is created inside the row iterations)
whereas the inefficient algorithm does the same over VectorView wrappers. I wonder if VectorView
has not been equipped to pass on the flavors of its backing vector to the vector-vector optimization.

Apparently the dot algorithm on vector view goes to the in-core vector-vector optimization
framework (calls aggregate()) but denseVector applies custom iteration. Hence it may boil
down to experiments of avec dot bvec vs. avec(::) dot bvec(::). 

{code}

  was:
Capturing here the Conversation on this subject:

{quote}

Turns out that matrix view traversal (of dense matrices, anyway) is 4 times slower than regular
matrix traversal in the same direction. I.e.

Ad %*% Bd: (106.33333333333333,85.0)
Ad(r,::) %*% Bd: (356.0,328.0)

where r=0 until Ad.nrow.

Investigated MatrixView, it reports correct matrix flavor (as the owner's) and correct algorithm
is selected (the same as for the row above). Sure, MatrixView gives an indirection(sometimes
even double indirection) but 4x?? It should not be that much different from transpose view
overhead, and transpose view overhead is very small in the tests (compared to the rest of
the cost)

The main difference seems to be that the algorithm over matrices ends up doing a dot over
DenseVector and a DenseVector (even that the wrapper object is created inside the row iterations)
whereas the inefficient algorithm does the same over VectorView wrappers. I wonder if VectorView
has not been equipped to pass on the flavors of its backing vector to the vector-vector optimization.

Apparently the dot algorithm on vector view goes to the in-core vector-vector optimization
framework (calls aggregate()) but denseVector applies custom iteration. Hence it may boil
down to experiments of avec dot bvec vs. avec(::) dot bvec(::). 

{quote}


> Multi-threaded Matrix Multiplication is slower than Single-thread variant
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1780
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1780
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.10.0, 0.10.1, 0.10.2, 0.11.0
>            Reporter: Suneel Marthi
>            Assignee: Dmitriy Lyubimov
>            Priority: Critical
>             Fix For: 0.12.0, 0.13.0
>
>
> Capturing here the Conversation on this subject:
> {code}
> Turns out that matrix view traversal (of dense matrices, anyway) is 4 times slower than
regular matrix traversal in the same direction. I.e.
> Ad %*% Bd: (106.33333333333333,85.0)
> Ad(r,::) %*% Bd: (356.0,328.0)
> where r=0 until Ad.nrow.
> Investigated MatrixView, it reports correct matrix flavor (as the owner's) and correct
algorithm is selected (the same as for the row above). Sure, MatrixView gives an indirection(sometimes
even double indirection) but 4x?? It should not be that much different from transpose view
overhead, and transpose view overhead is very small in the tests (compared to the rest of
the cost)
> The main difference seems to be that the algorithm over matrices ends up doing a dot
over DenseVector and a DenseVector (even that the wrapper object is created inside the row
iterations) whereas the inefficient algorithm does the same over VectorView wrappers. I wonder
if VectorView has not been equipped to pass on the flavors of its backing vector to the vector-vector
optimization.
> Apparently the dot algorithm on vector view goes to the in-core vector-vector optimization
framework (calls aggregate()) but denseVector applies custom iteration. Hence it may boil
down to experiments of avec dot bvec vs. avec(::) dot bvec(::). 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message