mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA
Date Mon, 12 Dec 2011 08:07:06 GMT
The person using this job knows the right vector to use. It may be
that it gets a lot of sparse vectors but will become a dense vector.
Or a vector that writes to a database. Or something else. In fact, I
may just want to turn a vector from Dense to Sparse, and I could
achieve that with this job.

On Mon, Dec 12, 2011 at 12:06 AM, Lance Norskog <goksron@gmail.com> wrote:
> To use a combiner, TupleWritable should be fine. I have not used it.
>
> But it will copy the entire vector. You would want to minimize this.
> If this is a big problem, you can do an ugly trick: you store the
> counter as the key value, but make a custom Writable that always
> returns 'this equals the other'. So, all of your counters have the
> same key and thus all vectors go to the same reducer.
>
>
>
> On Sun, Dec 11, 2011 at 8:14 PM, Raphael Cendrillon (Commented) (JIRA)
> <jira@apache.org> wrote:
>>
>>    [ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167341#comment-13167341
]
>>
>> Raphael Cendrillon commented on MAHOUT-923:
>> -------------------------------------------
>>
>> Thanks Lance. A combiner is definitely the next step. One question, is there already
a writable for tuples of e.g. int and Vector, or should I just write one from scratch? I know
there is TupleWritable, but from what I've read online it's better to avoid that unless you're
doing a multiple input join.
>>
>> Regarding the class for the output vector, are you saying that instead of inhereting
the class from the rows of the DistributedRowMatrix you'd rather be able to specify this manually?
>>
>>
>>
>>> Row mean job for PCA
>>> --------------------
>>>
>>>                 Key: MAHOUT-923
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-923
>>>             Project: Mahout
>>>          Issue Type: Improvement
>>>          Components: Math
>>>    Affects Versions: 0.6
>>>            Reporter: Raphael Cendrillon
>>>            Assignee: Raphael Cendrillon
>>>             Fix For: Backlog
>>>
>>>         Attachments: MAHOUT-923.patch
>>>
>>>
>>> Add map reduce job for calculating mean row (column-wise mean) of a Distributed
Row Matrix for use in PCA.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message