mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: [jira] Commented: (MAHOUT-6) Need a matrix implementation
Date Sun, 16 Mar 2008 21:28:36 GMT

I have been batting that question back and forth in my own head recently.

It IS absolutely a huge help to have labels.  R has the data.frame to do
this and it helps enormously.  I have done it in some applications and it
saved endless hassle.

On the other hand, there is a real danger that the label functionality would
get sucked into a single implementation.  Labels really are an orthogonal
concern that are (should be) independent of how the matrix is implemented.

So should there really be something like a LabeledMatrix wrapper that
provides this labeling service to any matrix?

On 3/16/08 2:23 PM, "Grant Ingersoll (JIRA)" <> wrote:

>     [ 
> system.issuetabpanels:comment-tabpanel&focusedCommentId=12579261#action_125792
> 61 ] 
> Grant Ingersoll commented on MAHOUT-6:
> --------------------------------------
> Does it make sense to be able to assign labels to the rows and columns and
> maybe even have it accessible as a map?  For instance, I think I could use
> these for the bayesian classifier implementation I am working on and it would
> make sense to be able to label the features and the labels.  Naturally, I can
> store the information elsewhere as well, but didn't know whether it made sense
> to keep the info w/ the matrix.
>> Need a matrix implementation
>> ----------------------------
>>                 Key: MAHOUT-6
>>                 URL:
>>             Project: Mahout
>>          Issue Type: New Feature
>>            Reporter: Ted Dunning
>>            Assignee: Grant Ingersoll
>>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
>> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
>> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff,
>> MAHOUT-6l.patch
>> We need matrices for Mahout.
>> An initial set of basic requirements includes:
>> a) sparse and dense support are required
>> b) row and column labels are important
>> c) serialization for hadoop use is required
>> d) reasonable floating point performance is required, but awesome FP is not
>> e) the API should be simple enough to understand
>> f) it should be easy to carve out sub-matrices for sending to different
>> reducers
>> g) a reasonable set of matrix operations should be supported, these should
>> eventually include:
>>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra
>> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>>     row and column sums
>>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u +
>> beta v
>> h) easy and efficient iteration constructs, especially for sparse matrices
>> i) easy to extend with new implementations

View raw message