mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1691) iterable of vectors to matrix
Date Tue, 16 Jun 2015 17:12:02 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588388#comment-14588388
] 

ASF GitHub Bot commented on MAHOUT-1691:
----------------------------------------

Github user dlyubimov commented on the pull request:

    https://github.com/apache/mahout/pull/138#issuecomment-112499826
  
    Alexey, there are a  few problems here.
    
    
    I believe much more computationally efficient form to do this as it stands 
        block.cloned := {(r,c,v) =>  v- mean(c) / std(c) }
    
    (1) Creation + assignment is much slower
    (2) Functional assignments take into account matrix structure and avoid inefficient iteration
directions. e.g. if block is really column-wise sparse matrix consisting of sparse sequential
columns, this iteration is 10...100x slower than it needs to be (as demonstrated by #135).
    (3) This syntax already exist in form of dense() or sparse() (if you want to assemble
a matrix from collection of vector rows). 
    (4) Finally, this code is most likely missing your intent because row slices are coming
from iterator in order which is not guaranteed. I.e. iterator() may be returning first row
number 20, then 5, then 31 etc. You assemble it back in order of iteration which is probably
not what you want.  Note that iterators return MatrixSlice, not just a vector, and the slice
has index() method which indicates its true row ordinal. 



> iterable of vectors to matrix 
> ------------------------------
>
>                 Key: MAHOUT-1691
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1691
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.10.1
>            Reporter: Alexey Grigorev
>            Priority: Minor
>              Labels: math, scala
>
> In Mahout scala bindings, instead of writing  
> {code}
> val res = drmX.mapBlock(drmX.ncol) {
>   case (keys, block) => {
>     val copy = block.like
>     copy := block.map(row => (row - mean) / std)
>     (keys, copy)
>   }
> }
> {code}
> I would like to be able to write 
> {code}
> val res = drmX.mapBlock(drmX.ncol) {
>   case (keys, block) => {
>     keys -> block.map(row => (row - mean) / std)
>   }
> }
> {code}
> Solution: add a method for implicit conversion from iterable to Matrix:
> {code}
>   implicit def iterable2Matrix(that: Iterable[Vector]): Matrix = {
>     val first = that.head
>     val nrow = that.size
>     val ncol = first.size
>     val m = if (first.isDense) {
>       new DenseMatrix(nrow, ncol)
>     } else {
>       new SparseRowMatrix(nrow, ncol)
>     }
>     that.zipWithIndex.foreach { case (row, idx) => 
>       m.assignRow(idx.toInt, row)
>     }
>     m
>   }
> {code}
> If it sounds nice, I can send a pull request with this implemented



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message