mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1582) Create simpler row and column aggregation API at local level
Date Mon, 16 Jun 2014 11:53:01 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032364#comment-14032364
] 

Sahil Sharma commented on MAHOUT-1582:
--------------------------------------

Hey,

Just to be clear, what you are talking about is this , right?
http://goo.gl/84dDBo

> Create simpler row and column aggregation API at local level
> ------------------------------------------------------------
>
>                 Key: MAHOUT-1582
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1582
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Ted Dunning
>
> The issue is that the current row and column aggregation API makes it difficult to do
anything but row by row aggregation using anonymous classes.  There is no scope for being
aware of locality, nor to use the well known function definitions in Functions.  This makes
lots of optimizations impossible and many of these are optimizations that we want to have.
 An example would be adding up absolute values of values.  With the current API, it would
be very hard to optimize for sparse matrices and the wrong direction of iteration but with
a different API, this should be easy.
> What I suggest is an API of this form:
> {code}
>    Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
> {code}
> This will produce a vector with one element per row in the original.  The nice thing
here is that if the matrix is row major, we can iterate over rows and accumulate a value for
each row using sparsity as available.  On the other hand, if the matrix is column major, we
can keep a vector of accumulators and still use sparsity as appropriate.  
> The use of sparsity comes in because the matrix code now has control over both of the
loops involved and also has visibility into properties of the map and combine functions. 
For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message