mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-6) Need a matrix implementation
Date Fri, 07 Mar 2008 17:27:46 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576283#action_12576283
] 

Ted Dunning commented on MAHOUT-6:
----------------------------------


Hashmaps in Java are surprisingly fast (very nearly 1 array access).

Their real cost is memory size and locality of reference.  Rennies sorted
index suggest is very tight on memory and is good for sequential accesses.
Random access isn't all *that* bad since binary search is available.  My
experience is similar to his... sequential scanning is vastly more important
than random access.

One point I would suggest, however, is to allow the vector to be unsorted
until it needs to be in order.  That allows fast filling of the vector
followed by a single sort the first time sequential scanning or random
access is done.






> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, MAHOUT-6d.diff,
MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different reducers
> g) a reasonable set of matrix operations should be supported, these should eventually
include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra operations,
A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message