mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-923) Row mean job for PCA
Date Mon, 12 Dec 2011 02:10:32 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167312#comment-13167312
] 

jiraposter@reviews.apache.org commented on MAHOUT-923:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3147/#review3838
-----------------------------------------------------------


Hm. I hope i did not read the code or miss something. 

1 -- i am not sure this will actually work as intended unless # of reducers is corced to 1,
of which i see no mention in the code. 
2 -- mappers do nothing, passing on all the row pressure to sort which is absolutely not necessary.
Even if you use combiners. This is going to be especially the case if you coerce 1 reducer
an no combiners. IMO mean computation should be pushed up to mappers to avoid sort pressures
of map reduce. Then reduction becomes largely symbolical(but you do need pass on the # of
rows mapper has seen, to the reducer, in order for that operation to apply correctly).
3 -- i am not sure -- is NullWritable as a key legit? In my experience sequence file reader
cannot instantiate it because NullWritable is a singleton and its creation is prohibited by
making constructor private.

- Dmitriy


On 2011-12-12 00:30:24, Raphael Cendrillon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3147/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-12 00:30:24)
bq.  
bq.  
bq.  Review request for mahout.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Here's a patch with a simple job to calculate the row mean (column-wise mean). One outstanding
issue is the combiner, this requires a wrtiable class IntVectorTupleWritable, where the Int
stores the number of rows, and the Vector stores the column-wise sum.
bq.  
bq.  
bq.  This addresses bug MAHOUT-923.
bq.      https://issues.apache.org/jira/browse/MAHOUT-923
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1213095

bq.    /trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowMeanJob.java PRE-CREATION

bq.    /trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java
1213095 
bq.  
bq.  Diff: https://reviews.apache.org/r/3147/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Junit test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Raphael
bq.  
bq.


                
> Row mean job for PCA
> --------------------
>
>                 Key: MAHOUT-923
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-923
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Raphael Cendrillon
>            Assignee: Raphael Cendrillon
>             Fix For: Backlog
>
>         Attachments: MAHOUT-923.patch
>
>
> Add map reduce job for calculating mean row (column-wise mean) of a Distributed Row Matrix
for use in PCA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message