incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-500) Add random matrix generator.
Date Fri, 13 Apr 2012 09:58:13 GMT

    [ https://issues.apache.org/jira/browse/HAMA-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253251#comment-13253251
] 

Thomas Jungblut commented on HAMA-500:
--------------------------------------

Hi Mikalai,

I answer your questions first before my massive feedback:

bq. 1) Is this algorithm is something you have expected?

For me this is enough.

bq. 2) Have I choose the right command line format?

Yes, it's better than what we've used before ;)

bq. 3) Is exception handling appropriate for Hama?

Yeah, but it would be cool if you can output the given value as well as the error message
if you throw IllegalArgumentExceptions. 

bq. 4) Is documentation style appropriate for Hama?

Yes, it is much more than we usually document, however there are a few mistakes I explain
you in the following paragraphs.

@author tags are discouraged at ASF, it's about collective code ownership and to secure you
from getting death threads when you introduce bugs. 
And empty @param or @return tags are not so good as well, either fill them or remove them.

You are using a lot of static configuration values, they won't be distributed in a clustering
environment.
Hama is distributing the jar with the class and instantiates it on each host reflecti-o-magically.
So all the numbers will stay default even if you set them in the main method (rows and columns
for example). 
Therefore you have the configuration (HamaConfiguration conf in your code). You can set values
for rows and columns like this:

{noformat}
   conf.setInt("matrix.rows", 10);
   conf.setInt("matrix.columns", 10);
{noformat}

and retrieve it via

{noformat}
   // 10 is default here if the matrix.rows property wasn't set anywhere
   int rows = conf.getInt("matrix.rows", 10);
   int cols = conf.getInt("matrix.columns", 10);
{noformat}

The conf will be serialized to XML and distributed with the jar and read once a task starts.

Then I would recommend you to check for illegal values in the main method or when you parse
the args instead of setters for static variables. 

And the output should be the matrix itself in a format which can be used for further processing,
e.G. the Sparse-Matrix-Vector Multiplication. So no text about the statistics, therefore you
can use the Counters in the BSPPeer class.

General Note and not really related to your code, but I guess we should use a math lib to
have sparse vectors and stuff (e.G. mahout-math). They have better HashMap implementations
that are designed for primitives (int/double pairs). 

But the algorithm itself seems fine, I have not tested it yet though. 

Good amount of work! Would you please fix my suggestions and upload a new patch? (name it
HAMA-500_1.patch or so). Thanks!

                
> Add random matrix generator.
> ----------------------------
>
>                 Key: HAMA-500
>                 URL: https://issues.apache.org/jira/browse/HAMA-500
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp, examples
>            Reporter: Edward J. Yoon
>            Assignee: Mikalai Parafeniuk
>              Labels: newbie
>         Attachments: HAMA-500.patch
>
>
> In this issue, we add random matrix generator to produce large data sets that can be
used for graph or matrix examples. This can be implemented using BSP job.
> It'd be nice if sparsity, symmetry, and size can be configurable via the command line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message